1
|
Lee CT, Zhang K, Li W, Tang K, Ling Y, Walji MF, Jiang X. Identifying predictors of the tooth loss phenotype in a large periodontitis patient cohort using a machine learning approach. J Dent 2024; 144:104921. [PMID: 38437976 DOI: 10.1016/j.jdent.2024.104921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 02/17/2024] [Accepted: 03/01/2024] [Indexed: 03/06/2024] Open
Abstract
OBJECTIVES This study aimed to identify predictors associated with the tooth loss phenotype in a large periodontitis patient cohort in the university setting. METHODS Information on periodontitis patients and nineteen factors identified at the initial visit was extracted from electronic health records. The primary outcome is tooth loss phenotype (presence or absence of tooth loss). Prediction models were built on significant factors (single or combinatory) selected by the RuleFit algorithm, and these factors were further adopted by regression models. Model performance was evaluated by Area Under the Receiver Operating Characteristic Curve (AUROC) and Area Under the Precision-Recall Curve (AUPRC). Associations between predictors and the tooth loss phenotype were also evaluated by classical statistical approaches to validate the performance of machine learning models. RESULTS In total, 7840 patients were included. The machine learning model predicting the tooth loss phenotype achieved AUROC of 0.71 and AUPRC of 0.66. Age, periodontal diagnosis, number of missing teeth at baseline, furcation involvement, and tooth mobility were associated with the tooth loss phenotype in both machine learning and classical statistical models. CONCLUSIONS The rule-based machine learning approach improves model explainability compared to classical statistical methods. However, the model's generalizability needs to be further validated by external datasets. CLINICAL SIGNIFICANCE Predictors identified by the current machine learning approach using the RuleFit algorithm had clinically relevant thresholds in predicting the tooth loss phenotype in a large and diverse periodontitis patient cohort. The results of this study will assist clinicians in performing risk assessment for periodontitis at the initial visit.
Collapse
Affiliation(s)
- Chun-Teh Lee
- Department of Periodontics and Dental Hygiene, The University of Texas Health Science Center at Houston School of Dentistry, 7500 Cambridge Street, Houston, TX 77054, USA
| | - Kai Zhang
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, 7000 Fannin St, Houston, Texas 77030, USA
| | - Wen Li
- Division of Clinical and Translational Sciences, Department of Internal Medicine, the University of Texas McGovern Medical School at Houston, 6431 Fannin St, Houston, Texas, USA; Biostatistics/Epidemiology/Research Design (BERD) Component, Center for Clinical and Translational Sciences (CCTS), University of Texas Health Science Center at Houston, 7000 Fannin St, Houston, Houston, Texas 77030, USA
| | - Kaichen Tang
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, 7000 Fannin St, Houston, Texas 77030, USA
| | - Yaobin Ling
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, 7000 Fannin St, Houston, Texas 77030, USA
| | - Muhammad F Walji
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, 7000 Fannin St, Houston, Texas 77030, USA; Department of Diagnostic and Biomedical Sciences, The University of Texas Health Science Center at Houston School of Dentistry, 7000 Fannin St., Houston, Texas 77030, USA
| | - Xiaoqian Jiang
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, 7000 Fannin St, Houston, Texas 77030, USA.
| |
Collapse
|
2
|
Cerono G, Chicco D. Ensemble machine learning reveals key features for diabetes duration from electronic health records. PeerJ Comput Sci 2024; 10:e1896. [PMID: 38435625 PMCID: PMC10909161 DOI: 10.7717/peerj-cs.1896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/30/2024] [Indexed: 03/05/2024]
Abstract
Diabetes is a metabolic disorder that affects more than 420 million of people worldwide, and it is caused by the presence of a high level of sugar in blood for a long period. Diabetes can have serious long-term health consequences, such as cardiovascular diseases, strokes, chronic kidney diseases, foot ulcers, retinopathy, and others. Even if common, this disease is uneasy to spot, because it often comes with no symptoms. Especially for diabetes type 2, that happens mainly in the adults, knowing how long the diabetes has been present for a patient can have a strong impact on the treatment they can receive. This information, although pivotal, might be absent: for some patients, in fact, the year when they received the diabetes diagnosis might be well-known, but the year of the disease unset might be unknown. In this context, machine learning applied to electronic health records can be an effective tool to predict the past duration of diabetes for a patient. In this study, we applied a regression analysis based on several computational intelligence methods to a dataset of electronic health records of 73 patients with diabetes type 1 with 20 variables and another dataset of records of 400 patients of diabetes type 2 with 49 variables. Among the algorithms applied, Random Forests was able to outperform the other ones and to efficiently predict diabetes duration for both the cohorts, with the regression performances measured through the coefficient of determination R2. Afterwards, we applied the same method for feature ranking, and we detected the most relevant factors of the clinical records correlated with past diabetes duration: age, insulin intake, and body-mass index. Our study discoveries can have profound impact on clinical practice: when the information about the duration of diabetes of patient is missing, medical doctors can use our tool and focus on age, insulin intake, and body-mass index to infer this important aspect. Regarding limitations, unfortunately we were unable to find additional dataset of EHRs of patients with diabetes having the same variables of the two analyzed here, so we could not verify our findings on a validation cohort.
Collapse
Affiliation(s)
- Gabriel Cerono
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Canada
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| |
Collapse
|
3
|
Mette C, Verboux D, Rachas A, Debeugny G. Predicting the risk of becoming eligible for the disability pension: Machine learning methods applied to French health data. Sante Publique 2024; 35:65-85. [PMID: 38388403 DOI: 10.3917/spub.236.0065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Introduction Benefiting from the disability pension implies morbid (physical and psychological) and social (fall in income) implications for the person. It also has economic consequences for society, with increasing expenses since 2011 (+4.9% on average per year). Investing in preventive actions against the loss of the ability to work should limit these consequences, but it requires targeting people at risk. The development of artificial intelligence opens up prospects in this regard. Purpose of the Research To target, using supervised machine learning methods, those people with a high probability of becoming eligible for the disability pension over the course of the year based on their socio-demographic and medical characteristics (pathologies, work stoppages, drugs taken, and medical procedures). Method Among the beneficiaries of the French public welfare system aged 20–64 in 2017, we compared the socio-demographic and medical characteristics between 2014 and 2016 of those who received a disability pension in 2017 and not before, and those who did not receive a disability pension from 2014 to 2017. The determination of the boundary between these two groups was tested using logistic regression, decision trees, random forests, naive Bayes classifiers, and support vector machines. The models’ performance was compared with respect to accuracy, precision, sensitivity, specificity, and AUC (area under the curve). Finally, the predictive power of each factor was measured by AUC too. Results The boosted logistic regression had the best performance for three of the five criteria, but low sensitivity. The best sensitivity was obtained with the support vector machines, with an accuracy close to that of the boosted logistic regression, but a lower precision and specificity. Random forests offered the best discriminatory ability. The naive Bayes classifier had the worst performance. The most predictive factors in becoming eligible for the disability pension were having 30 days or more off sick in 2014, 2015, and 2016 and being aged 55 to 64. Conclusion Supervised learning methods have appeared relevant for identifying people with the highest probability of becoming eligible for the disability pension and, more broadly, for steering public and social policies.
Collapse
Affiliation(s)
- Corinne Mette
- Direction de la stratégie, des études et des statistiques, Caisse nationale de l’Assurance maladie, Paris, France
| | - Dorian Verboux
- Direction de la stratégie, des études et des statistiques, Caisse nationale de l’Assurance maladie, Paris, France
- Laboratoire ERUDITE, université Paris-Est, faculté de sciences économiques et de gestion, Créteil, France
| | - Antoine Rachas
- Direction de la stratégie, des études et des statistiques, Caisse nationale de l’Assurance maladie, Paris, France
| | - Gonzague Debeugny
- Direction de la stratégie, des études et des statistiques, Caisse nationale de l’Assurance maladie, Paris, France
| |
Collapse
|
4
|
Malara N, Coluccio ML, Grillo F, Ferrazzo T, Garo NC, Donato G, Lavecchia A, Fulciniti F, Sapino A, Cascardi E, Pellegrini A, Foxi P, Furlanello C, Negri G, Fadda G, Capitanio A, Pullano S, Garo VM, Ferrazzo F, Lowe A, Torsello A, Candeloro P, Gentile F. Multicancer screening test based on the detection of circulating non haematological proliferating atypical cells. Mol Cancer 2024; 23:32. [PMID: 38350884 PMCID: PMC10863189 DOI: 10.1186/s12943-024-01951-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 01/30/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND the problem in early diagnosis of sporadic cancer is understanding the individual's risk to develop disease. In response to this need, global scientific research is focusing on developing predictive models based on non-invasive screening tests. A tentative solution to the problem may be a cancer screening blood-based test able to discover those cell requirements triggering subclinical and clinical onset latency, at the stage when the cell disorder, i.e. atypical epithelial hyperplasia, is still in a subclinical stage of proliferative dysregulation. METHODS a well-established procedure to identify proliferating circulating tumor cells was deployed to measure the cell proliferation of circulating non-haematological cells which may suggest tumor pathology. Moreover, the data collected were processed by a supervised machine learning model to make the prediction. RESULTS the developed test combining circulating non-haematological cell proliferation data and artificial intelligence shows 98.8% of accuracy, 100% sensitivity, and 95% specificity. CONCLUSION this proof of concept study demonstrates that integration of innovative non invasive methods and predictive-models can be decisive in assessing the health status of an individual, and achieve cutting-edge results in cancer prevention and management.
Collapse
Affiliation(s)
- Natalia Malara
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy.
| | - Maria Laura Coluccio
- Department of Experimental and Clinical Medicine, University Magna Graecia, Catanzaro, IT, Italy
| | - Fabiana Grillo
- Department of Chemistry, University of Leicester, Leicester, UK
| | - Teresa Ferrazzo
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy
| | - Nastassia C Garo
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy
| | - Giuseppe Donato
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy
| | | | | | - Anna Sapino
- Candiolo Cancer Institute, FPO-IRCCS, Candiolo (TO), Turin, Italy
| | - Eliano Cascardi
- Candiolo Cancer Institute, FPO-IRCCS, Candiolo (TO), Turin, Italy
| | - Antonella Pellegrini
- Società Italiana di Citologia (SICi), AO S.Giovanni-Addolorata, President, Roma, IT, Italy
| | - Prassede Foxi
- Cytodiagnostic Pistoia-Pescia Unit, USL Toscana Centro, Pistoia, IT, 51100, Italy
| | | | - Giovanni Negri
- Pathology Unit, Central Hospital Bolzano, via Boehler 5, Bolzano, IT, 39100, Italy
| | - Guido Fadda
- Human Pathology Department, Gaetano Barresi University, Messina, IT, Italy
| | - Arrigo Capitanio
- Linköping University Hospital SE , Linköping University, Linköping, Sweden
| | - Salvatore Pullano
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy
| | - Virginia M Garo
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy
| | - Francesca Ferrazzo
- Department of Health Sciences, University Magna Graecia, Catanzaro, IT, Italy
| | - Alarice Lowe
- Department of Pathology, Stanford University Hospital, Stanford, CA, USA
| | | | - Patrizio Candeloro
- Department of Experimental and Clinical Medicine, University Magna Graecia, Catanzaro, IT, Italy
| | - Francesco Gentile
- Department of Experimental and Clinical Medicine, University Magna Graecia, Catanzaro, IT, Italy
| |
Collapse
|
5
|
Faviez C, Vincent M, Garcelon N, Boyer O, Knebelmann B, Heidet L, Saunier S, Chen X, Burgun A. Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity. Orphanet J Rare Dis 2024; 19:55. [PMID: 38336713 PMCID: PMC10858490 DOI: 10.1186/s13023-024-03063-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 02/03/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Rare diseases affect approximately 400 million people worldwide. Many of them suffer from delayed diagnosis. Among them, NPHP1-related renal ciliopathies need to be diagnosed as early as possible as potential treatments have been recently investigated with promising results. Our objective was to develop a supervised machine learning pipeline for the detection of NPHP1 ciliopathy patients from a large number of nephrology patients using electronic health records (EHRs). METHODS AND RESULTS We designed a pipeline combining a phenotyping module re-using unstructured EHR data, a semantic similarity module to address the phenotype dependence, a feature selection step to deal with high dimensionality, an undersampling step to address the class imbalance, and a classification step with multiple train-test split for the small number of rare cases. The pipeline was applied to thirty NPHP1 patients and 7231 controls and achieved good performances (sensitivity 86% with specificity 90%). A qualitative review of the EHRs of 40 misclassified controls showed that 25% had phenotypes belonging to the ciliopathy spectrum, which demonstrates the ability of our system to detect patients with similar conditions. CONCLUSIONS Our pipeline reached very encouraging performance scores for pre-diagnosing ciliopathy patients. The identified patients could then undergo genetic testing. The same data-driven approach can be adapted to other rare diseases facing underdiagnosis challenges.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France.
- Inria, 75012, Paris, France.
| | - Marc Vincent
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Olivia Boyer
- Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, Hôpital Necker-Enfants Malades, Université Paris Cité, 75015, Paris, France
- Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, Université Paris Cité, 75015, Paris, France
| | - Bertrand Knebelmann
- Nephrology and Transplantation Department, MARHEA, Hôpital Necker-Enfants Malades, AP-HP, Université Paris Cité, 75015, Paris, France
| | - Laurence Heidet
- Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, Hôpital Necker-Enfants Malades, Université Paris Cité, 75015, Paris, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, Université Paris Cité, 75015, Paris, France
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Département d'informatique Médicale, Hôpital Necker-Enfants Malades, AP-HP, 75015, Paris, France
| |
Collapse
|
6
|
Buchlak QD, Tang CHM, Seah JCY, Johnson A, Holt X, Bottrell GM, Wardman JB, Samarasinghe G, Dos Santos Pinheiro L, Xia H, Ahmad HK, Pham H, Chiang JI, Ektas N, Milne MR, Chiu CHY, Hachey B, Ryan MK, Johnston BP, Esmaili N, Bennett C, Goldschlager T, Hall J, Vo DT, Oakden-Rayner L, Leveque JC, Farrokhi F, Abramson RG, Jones CM, Edelstein S, Brotchie P. Effects of a comprehensive brain computed tomography deep learning model on radiologist detection accuracy. Eur Radiol 2024; 34:810-822. [PMID: 37606663 PMCID: PMC10853361 DOI: 10.1007/s00330-023-10074-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/16/2023] [Accepted: 07/01/2023] [Indexed: 08/23/2023]
Abstract
OBJECTIVES Non-contrast computed tomography of the brain (NCCTB) is commonly used to detect intracranial pathology but is subject to interpretation errors. Machine learning can augment clinical decision-making and improve NCCTB scan interpretation. This retrospective detection accuracy study assessed the performance of radiologists assisted by a deep learning model and compared the standalone performance of the model with that of unassisted radiologists. METHODS A deep learning model was trained on 212,484 NCCTB scans drawn from a private radiology group in Australia. Scans from inpatient, outpatient, and emergency settings were included. Scan inclusion criteria were age ≥ 18 years and series slice thickness ≤ 1.5 mm. Thirty-two radiologists reviewed 2848 scans with and without the assistance of the deep learning system and rated their confidence in the presence of each finding using a 7-point scale. Differences in AUC and Matthews correlation coefficient (MCC) were calculated using a ground-truth gold standard. RESULTS The model demonstrated an average area under the receiver operating characteristic curve (AUC) of 0.93 across 144 NCCTB findings and significantly improved radiologist interpretation performance. Assisted and unassisted radiologists demonstrated an average AUC of 0.79 and 0.73 across 22 grouped parent findings and 0.72 and 0.68 across 189 child findings, respectively. When assisted by the model, radiologist AUC was significantly improved for 91 findings (158 findings were non-inferior), and reading time was significantly reduced. CONCLUSIONS The assistance of a comprehensive deep learning model significantly improved radiologist detection accuracy across a wide range of clinical findings and demonstrated the potential to improve NCCTB interpretation. CLINICAL RELEVANCE STATEMENT This study evaluated a comprehensive CT brain deep learning model, which performed strongly, improved the performance of radiologists, and reduced interpretation time. The model may reduce errors, improve efficiency, facilitate triage, and better enable the delivery of timely patient care. KEY POINTS • This study demonstrated that the use of a comprehensive deep learning system assisted radiologists in the detection of a wide range of abnormalities on non-contrast brain computed tomography scans. • The deep learning model demonstrated an average area under the receiver operating characteristic curve of 0.93 across 144 findings and significantly improved radiologist interpretation performance. • The assistance of the comprehensive deep learning model significantly reduced the time required for radiologists to interpret computed tomography scans of the brain.
Collapse
Affiliation(s)
- Quinlan D Buchlak
- Annalise.ai, Sydney, NSW, Australia.
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia.
- Department of Neurosurgery, Monash Health, Clayton, VIC, Australia.
| | | | - Jarrel C Y Seah
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, Alfred Health, Melbourne, VIC, Australia
| | | | | | | | | | | | | | | | | | - Hung Pham
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, University Medical Center, University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
| | - Jason I Chiang
- Annalise.ai, Sydney, NSW, Australia
- Department of General Practice, University of Melbourne, Melbourne, VIC, Australia
- Westmead Applied Research Centre, University of Sydney, Sydney, NSW, Australia
| | | | | | | | | | | | | | - Nazanin Esmaili
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
| | - Christine Bennett
- School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia
| | - Tony Goldschlager
- Department of Neurosurgery, Monash Health, Clayton, VIC, Australia
- Department of Surgery, Monash University, Clayton, VIC, Australia
| | - Jonathan Hall
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia
- Department of Radiology, Austin Hospital, Melbourne, VIC, Australia
| | - Duc Tan Vo
- Department of Radiology, University Medical Center, University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA, Australia
| | | | - Farrokh Farrokhi
- Center for Neurosciences and Spine, Virginia Mason Franciscan Health, Seattle, WA, USA
| | | | - Catherine M Jones
- Annalise.ai, Sydney, NSW, Australia
- I-MED Radiology Network, Brisbane, QLD, Australia
- School of Public and Preventive Health, Monash University, Clayton, VIC, Australia
- Department of Clinical Imaging Science, University of Sydney, Sydney, NSW, Australia
| | - Simon Edelstein
- Annalise.ai, Sydney, NSW, Australia
- I-MED Radiology Network, Brisbane, QLD, Australia
- Department of Radiology, Monash Health, Clayton, VIC, Australia
| | - Peter Brotchie
- Annalise.ai, Sydney, NSW, Australia
- Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia
| |
Collapse
|
7
|
Atreya MR, Banerjee S, Lautz AJ, Alder MN, Varisco BM, Wong HR, Muszynski JA, Hall MW, Sanchez-Pinto LN, Kamaleswaran R. Machine learning-driven identification of the gene-expression signature associated with a persistent multiple organ dysfunction trajectory in critical illness. EBioMedicine 2024; 99:104938. [PMID: 38142638 PMCID: PMC10788426 DOI: 10.1016/j.ebiom.2023.104938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/08/2023] [Accepted: 12/12/2023] [Indexed: 12/26/2023] Open
Abstract
BACKGROUND Multiple organ dysfunction syndrome (MODS) disproportionately drives morbidity and mortality among critically ill patients. However, we lack a comprehensive understanding of its pathobiology. Identification of genes associated with a persistent MODS trajectory may shed light on underlying biology and allow for accurate prediction of those at-risk. METHODS Secondary analyses of publicly available gene-expression datasets. Supervised machine learning (ML) was used to identify a parsimonious set of genes associated with a persistent MODS trajectory in a training set of pediatric septic shock. We optimized model parameters and tested risk-prediction capabilities in independent validation and test datasets, respectively. We compared model performance relative to an established gene-set predictive of sepsis mortality. FINDINGS Patients with a persistent MODS trajectory had 568 differentially expressed genes and characterized by a dysregulated innate immune response. Supervised ML identified 111 genes associated with the outcome of interest on repeated cross-validation, with an AUROC of 0.87 (95% CI: 0.85-0.88) in the training set. The optimized model, limited to 20 genes, achieved AUROCs ranging from 0.74 to 0.79 in the validation and test sets to predict those with persistent MODS, regardless of host age and cause of organ dysfunction. Our classifier demonstrated reproducibility in identifying those with persistent MODS in comparison with a published gene-set predictive of sepsis mortality. INTERPRETATION We demonstrate the utility of supervised ML driven identification of the genes associated with persistent MODS. Pending validation in enriched cohorts with a high burden of organ dysfunction, such an approach may inform targeted delivery of interventions among at-risk patients. FUNDING H.R.W.'s NIHR35GM126943 award supported the work detailed in this manuscript. Upon his death, the award was transferred to M.N.A. M.R.A., N.S.P, and R.K were supported by NIHR21GM151703. R.K. was supported by R01GM139967.
Collapse
Affiliation(s)
- Mihir R Atreya
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center and Cincinnati Children's Research Foundation, Cincinnati, 45229, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA.
| | - Shayantan Banerjee
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600 036, India
| | - Andrew J Lautz
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center and Cincinnati Children's Research Foundation, Cincinnati, 45229, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Matthew N Alder
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center and Cincinnati Children's Research Foundation, Cincinnati, 45229, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Brian M Varisco
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center and Cincinnati Children's Research Foundation, Cincinnati, 45229, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Hector R Wong
- Division of Critical Care Medicine, Cincinnati Children's Hospital Medical Center and Cincinnati Children's Research Foundation, Cincinnati, 45229, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Jennifer A Muszynski
- Division of Critical Care Medicine, Nationwide Children's Hospital, Columbus, 43205, OH, USA; Department of Pediatrics, Ohio State University, Columbus, 43205, OH, USA
| | - Mark W Hall
- Division of Critical Care Medicine, Nationwide Children's Hospital, Columbus, 43205, OH, USA; Department of Pediatrics, Ohio State University, Columbus, 43205, OH, USA
| | - L Nelson Sanchez-Pinto
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, 60611, IL, USA; Department of Health and Biomedical Informatics, Northwestern University Feinberg School of Medicine, Chicago, 60611, IL, USA
| | - Rishikesan Kamaleswaran
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, 30322, GA, United States; Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, 30322, GA, United States
| |
Collapse
|
8
|
Patnaik P, Khodaee A, Vasam G, Mukherjee A, Salsabili S, Ukwatta E, Grynspan D, Chan ADC, Bainbridge S. Automated detection of microscopic placental features indicative of maternal vascular malperfusion using machine learning. Placenta 2024; 145:19-26. [PMID: 38011757 DOI: 10.1016/j.placenta.2023.11.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/07/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023]
Abstract
INTRODUCTION Hypertensive disorders of pregnancy (HDP) and fetal growth restriction (FGR) are common obstetrical complications, often with pathological features of maternal vascular malperfusion (MVM) in the placenta. Currently, clinical placental pathology methods involve a manual visual examination of histology sections, a practice that can be resource-intensive and demonstrates moderate-to-poor inter-pathologist agreement on diagnostic outcomes, dependant on the degree of pathologist sub-specialty training. METHODS This study aims to apply machine learning (ML) feature extraction methods to classify digital images of placental histopathology specimens, collected from cases of HDP [pregnancy induced hypertension (PIH), preeclampsia (PE), PE + FGR], normotensive FGR, and healthy pregnancies, according to the presence or absence of MVM lesions. 159 digital images were captured from histological placental specimens, manually scored for MVM lesions (MVM- or MVM+) and used to develop a support vector machine (SVM) classifier model, using features extracted from pre-trained ResNet18. The model was trained with data augmentation and shuffling, with the performance assessed for patch-level and image-level classification through measurements of accuracy, precision, and recall using confusion matrices. RESULTS The SVM model demonstrated accuracies of 70 % and 79 % for patch-level and image-level MVM classification, respectively, with poorest performance observed on images with borderline MVM presence, as determined through post hoc observation. DISCUSSION The results are promising for the integration of ML methods into the placental histopathological examination process. Using this study as a proof-of-concept will lead our group and others to carry ML models further in placental histopathology.
Collapse
Affiliation(s)
- Purvasha Patnaik
- Interdisciplinary School of Health Sciences, Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada
| | - Afsoon Khodaee
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
| | - Goutham Vasam
- Interdisciplinary School of Health Sciences, Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada
| | - Anika Mukherjee
- Interdisciplinary School of Health Sciences, Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada
| | - Sina Salsabili
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
| | - Eranga Ukwatta
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada; School of Engineering, University of Guelph, Guelph, ON, Canada
| | - David Grynspan
- Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada; Children's Hospital of Eastern Ontario, Ottawa, ON, Canada; Department of Pathology and Laboratory Medicine, Faculty of Medicine, The University of British Columbia, Vernon Jubilee Hospital, Vancouver, BC, Canada
| | - Adrian D C Chan
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
| | - Shannon Bainbridge
- Interdisciplinary School of Health Sciences, Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada; Department of Cellular and Molecular Medicine, University of Ottawa, Ottawa, ON, Canada.
| |
Collapse
|
9
|
Usuzaki T, Takahashi K, Takagi H, Ishikuro M, Obara T, Yamaura T, Kamimoto M, Majima K. Efficacy of exponentiation method with a convolutional neural network for classifying lung nodules on CT images by malignancy level. Eur Radiol 2023; 33:9309-9319. [PMID: 37477673 DOI: 10.1007/s00330-023-09946-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 04/24/2023] [Accepted: 05/19/2023] [Indexed: 07/22/2023]
Abstract
OBJECTIVES The aim of this study was to examine the performance of a convolutional neural network (CNN) combined with exponentiating each pixel value in classifying benign and malignant lung nodules on computed tomography (CT) images. MATERIALS AND METHODS Images in the Lung Image Database Consortium-Image Database Resource Initiative (LIDC-IDRI) were analyzed. Four CNN models were then constructed to classify the lung nodules by malignancy level (malignancy level 1 vs. 2, malignancy level 1 vs. 3, malignancy level 1 vs. 4, and malignancy level 1 vs. 5). The exponentiation method was applied for exponent values of 1.0 to 10.0 in increments of 0.5. Accuracy, sensitivity, specificity, and area under the curve of receiver operating characteristics (AUC-ROC) were calculated. These statistics were compared between an exponent value of 1.0 and all other exponent values in each model by the Mann-Whitney U-test. RESULTS In malignancy 1 vs. 4, maximum test accuracy (MTA; exponent value = 2.0, 3.0, 3.5, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, and 10.0) and specificity (6.5, 7.0, and 9.0) were improved by up to 0.012 and 0.037, respectively. In malignancy 1 vs. 5, MTA (6.5 and 7.0) and sensitivity (1.5) were improved by up to 0.030 and 0.0040, respectively. CONCLUSIONS The exponentiation method improved the performance of the CNN in the task of classifying lung nodules on CT images as benign or malignant. The exponentiation method demonstrated two advantages: improved accuracy, and the ability to adjust sensitivity and specificity by selecting an appropriate exponent value. CLINICAL RELEVANCE STATEMENT Adjustment of sensitivity and specificity by selecting an exponent value enables the construction of proper CNN models for screening, diagnosis, and treatment processes among patients with lung nodules. KEY POINTS • The exponentiation method improved the performance of the convolutional neural network. • Contrast accentuation by the exponentiation method may derive features of lung nodules. • Sensitivity and specificity can be adjusted by selecting an exponent value.
Collapse
Affiliation(s)
- Takuma Usuzaki
- Department of Diagnostic Radiology, Tohoku University Hospital, 1-1 Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8574, Japan.
| | - Kengo Takahashi
- Tohoku University Graduate School of Medicine, Sendai, Japan
| | - Hidenobu Takagi
- Department of Diagnostic Radiology, Tohoku University Hospital, 1-1 Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8574, Japan
- Department of Advanced MRI Collaborative Research, Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Mami Ishikuro
- Division of Molecular Epidemiology, Graduate School of Medicine, Tohoku University, Sendai, Miyagi, Japan
| | - Taku Obara
- Division of Molecular Epidemiology, Graduate School of Medicine, Tohoku University, Sendai, Miyagi, Japan
- Division of Molecular Epidemiology, Department of Preventive Medicine and Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
- Department of Pharmaceutical Sciences, Tohoku University Hospital, Sendai, Japan
| | | | | | | |
Collapse
|
10
|
Chavalparit P, Wilartratsami S, Santipas B, Ittichaiwong P, Veerakanjana K, Luksanapruksa P. Development of Machine-Learning Models to Predict Ambulation Outcomes Following Spinal Metastasis Surgery. Asian Spine J 2023; 17:1013-1023. [PMID: 38050361 PMCID: PMC10764138 DOI: 10.31616/asj.2023.0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 06/30/2023] [Accepted: 07/10/2023] [Indexed: 12/06/2023] Open
Abstract
STUDY DESIGN Retrospective cohort study. PURPOSE This study aimed to develop machine-learning algorithms to predict ambulation outcomes following surgery for spinal metastasis. OVERVIEW OF LITERATURE Postoperative ambulation status following spinal metastasis surgery is currently difficult to predict. The improved ability to predict this important postoperative outcome would facilitate management decision-making and help in determining realistic treatment goals. METHODS This retrospective study included patients who underwent spinal metastasis at a university-based medical center in Thailand between January 2009 and November 2021. Collected data included preoperative parameters and ambulatory status 90 and 180 days following surgery. Thirteen machine-learning algorithms, namely, artificial neural network, logistic regression, CatBoost classifier, linear discriminant analysis, extreme gradient boosting, extra trees classifier, random forest classifier, gradient boosting classifier, light gradient boosting machine, naïve Bayes, K-neighbor classifier, Ada boost classifier, and decision tree classifier were developed to predict ambulatory status 90 and 180 days following surgery. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1-score. RESULTS In total, 167 patients were enrolled. The number of patients classified as ambulatory 90 and 180 days following surgery was 140 (81.9%) and 137 (82.0%), respectively. The extreme gradient boosting algorithm was found to most accurately predict 180-day ambulatory outcome (AUC, 0.85; F1-score, 0.90), and the decision tree algorithm most accurately predicted 90-day ambulatory outcome (AUC, 0.94; F1-score, 0.88). CONCLUSIONS Machine-learning algorithms were effective in predicting ambulatory status following surgery for spinal metastasis. Based on our data, the extreme gradient boosting and decision tree best predicted postoperative ambulatory status 180 and 90 days after spinal metastasis surgery, respectively.
Collapse
Affiliation(s)
- Piya Chavalparit
- Department of Orthopaedic Surgery, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok,
Thailand
| | - Sirichai Wilartratsami
- Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok,
Thailand
| | - Borriwat Santipas
- Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok,
Thailand
| | - Piyalitt Ittichaiwong
- Siriraj Informatics and Data Innovation Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok,
Thailand
| | - Kanyakorn Veerakanjana
- Siriraj Informatics and Data Innovation Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok,
Thailand
| | - Panya Luksanapruksa
- Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok,
Thailand
| |
Collapse
|
11
|
Hu H, Wei XY, Liu L, Wang YB, Jia HJ, Bu LK, Pei DS. Supervised machine learning improves general applicability of eDNA metabarcoding for reservoir health monitoring. Water Res 2023; 246:120686. [PMID: 37812979 DOI: 10.1016/j.watres.2023.120686] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 09/25/2023] [Accepted: 09/29/2023] [Indexed: 10/11/2023]
Abstract
Effective and standardized monitoring methodologies are vital for successful reservoir restoration and management. Environmental DNA (eDNA) metabarcoding sequencing offers a promising alternative for biomonitoring and can overcome many limitations of traditional morphological bioassessment. Recent attempts have even shown that supervised machine learning (SML) can directly infer biotic indices (BI) from eDNA metabarcoding data, bypassing the cumbersome calculation process of BI regardless of the taxonomic assignment of eDNA sequences. However, questions surrounding the general applicability of this taxonomy-free approach to monitoring reservoir health remain unclear, including model stability, feature selection, algorithm choice, and multi-season biomonitoring. Here, we firstly developed a novel biological integrity index (Me-IBI) that integrates multitrophic interactions and environmental information, based on taxonomy-assigned eDNA metabarcoding data. The Me-IBI can better distinguish the actual health status of the Three Gorges Reservoir (TGR) than physicochemical assessments and have a clear response to human activity. Then, taking this reliable Me-IBI as a supervised label, we compared the impact of selecting different numbers of features and SML algorithms on the stability and predictive performance of the model for predicting ecological conditions in multiple seasons using taxonomy-free eDNA metabarcoding data. We discovered that even with a small number of features, different SML algorithms can establish a stable model and obtain excellent predictive performance. Finally, we proposed a four-step strategy for standardized routine biomonitoring using SML tools. Our study firstly explores the general applicability problem of the taxonomy-free eDNA-SML approach and establishes a solid foundation for the large-scale and standardized biomonitoring application.
Collapse
Affiliation(s)
- Huan Hu
- Chongqing Jiaotong University, Chongqing, 400074, China; Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Xing-Yi Wei
- Chongqing Jiaotong University, Chongqing, 400074, China; Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Li Liu
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China; Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
| | - Yuan-Bo Wang
- Chongqing Jiaotong University, Chongqing, 400074, China; Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Huang-Jie Jia
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Ling-Kang Bu
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - De-Sheng Pei
- School of Public Health, Chongqing Medical University, Chongqing, 400016, China.
| |
Collapse
|
12
|
Haghish EF, Obaidi M, Strømme T, Bjørgo T, Grønnerød C. Mental Health, Well-Being, and Adolescent Extremism: A Machine Learning Study on Risk and Protective Factors. Res Child Adolesc Psychopathol 2023; 51:1699-1714. [PMID: 37535227 PMCID: PMC10627959 DOI: 10.1007/s10802-023-01105-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2023] [Indexed: 08/04/2023]
Abstract
We examined the relationship between adolescents' extremist attitudes with a multitude of mental health, well-being, psycho-social, environmental, and lifestyle variables, using state-of-the-art machine learning procedure and nationally representative survey dataset of Norwegian adolescents (N = 11,397). Three key research questions were addressed: 1) can adolescents with extremist attitudes be distinguished from those without, using psycho-socio-environmental survey items, 2) what are the most important predictors of adolescents' extremist attitudes, and 3) whether the identified predictors correspond to specific latent factorial structures? Of the total sample, 17.6% showed elevated levels of extremist attitudes. The prevalence was significantly higher among boys and younger adolescents than girls and older adolescents, respectively. The machine learning model reached an AUC of 76.7%, with an equal sensitivity and specificity of 70.5% in the test dataset, demonstrating a satisfactory performance for the model. Items reflecting on positive parenting, quality of relationships with parents and peers, externalizing behavior, and well-being emerged as significant predictors of extremism. Exploratory factor analysis partially supported the suggested latent clusters. Out of the 550 psycho-socio-environmental variables analyzed, behavioral problems, individual and social well-being, along with basic needs such as a secure family environment and interpersonal relationships with parents and peers emerged as significant factors contributing to susceptibility to extremism among adolescents.
Collapse
Affiliation(s)
- E F Haghish
- Department of Psychology, University of Oslo, Oslo, Norway.
| | - Milan Obaidi
- Department of Psychology, University of Oslo, Oslo, Norway
- Department of Psychology, Copenhagen University, Copenhagen, Denmark
| | - Thea Strømme
- Centre for the Study of Professions, Oslo Metropolitan University, Oslo, Norway
| | - Tore Bjørgo
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Cato Grønnerød
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
13
|
Liu Q, Zhang W, Pei Y, Tao H, Ma J, Li R, Zhang F, Wang L, Shen L, Liu Y, Jia X, Hu Y. Gut mycobiome as a potential non-invasive tool in early detection of lung adenocarcinoma: a cross-sectional study. BMC Med 2023; 21:409. [PMID: 37904139 PMCID: PMC10617124 DOI: 10.1186/s12916-023-03095-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 09/26/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND The gut mycobiome of patients with lung adenocarcinoma (LUAD) remains unexplored. This study aimed to characterize the gut mycobiome in patients with LUAD and evaluate the potential of gut fungi as non-invasive biomarkers for early diagnosis. METHODS In total, 299 fecal samples from Beijing, Suzhou, and Hainan were collected prospectively. Using internal transcribed spacer 2 sequencing, we profiled the gut mycobiome. Five supervised machine learning algorithms were trained on fungal signatures to build an optimized prediction model for LUAD in a discovery cohort comprising 105 patients with LUAD and 61 healthy controls (HCs) from Beijing. Validation cohorts from Beijing, Suzhou, and Hainan comprising 44, 17, and 15 patients with LUAD and 26, 19, and 12 HCs, respectively, were used to evaluate efficacy. RESULTS Fungal biodiversity and richness increased in patients with LUAD. At the phylum level, the abundance of Ascomycota decreased, while that of Basidiomycota increased in patients with LUAD. Candida and Saccharomyces were the dominant genera, with a reduction in Candida and an increase in Saccharomyces, Aspergillus, and Apiotrichum in patients with LUAD. Nineteen operational taxonomic unit markers were selected, and excellent performance in predicting LUAD was achieved (area under the curve (AUC) = 0.9350) using a random forest model with outcomes superior to those of four other algorithms. The AUCs of the Beijing, Suzhou, and Hainan validation cohorts were 0.9538, 0.9628, and 0.8833, respectively. CONCLUSIONS For the first time, the gut fungal profiles of patients with LUAD were shown to represent potential non-invasive biomarkers for early-stage diagnosis.
Collapse
Affiliation(s)
- Qingyan Liu
- Graduate School, Chinese People's Liberation Army Medical School, Beijing, China
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China
| | - Weidong Zhang
- Graduate School, Chinese People's Liberation Army Medical School, Beijing, China
- Department of Thoracic Surgery, First Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100000, China
| | - Yanbin Pei
- Graduate School, Chinese People's Liberation Army Medical School, Beijing, China
| | - Haitao Tao
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China
| | - Junxun Ma
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China
| | - Rong Li
- Department of Health Medicine, Second Medical Center of the Chinese People's Liberation Army General Hospital, Beijing, China
| | - Fan Zhang
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China
| | - Lijie Wang
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China
| | - Leilei Shen
- Department of Thoracic Surgery, Hainan Medical Center of the Chinese People's Liberation Army General Hospital, Hainan, China
| | - Yang Liu
- Department of Thoracic Surgery, First Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100000, China.
| | - Xiaodong Jia
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China.
| | - Yi Hu
- Department of Oncology, Fifth Medical Center of the Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian Distrist, Beijing, 100000, China.
| |
Collapse
|
14
|
Obukhov NV, Naish PLN, Solnyshkina IE, Siourdaki TG, Martynov IA. Real-time assessment of hypnotic depth, using an EEG-based brain-computer interface: a preliminary study. BMC Res Notes 2023; 16:288. [PMID: 37875937 PMCID: PMC10599062 DOI: 10.1186/s13104-023-06553-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 10/02/2023] [Indexed: 10/26/2023] Open
Abstract
OBJECTIVE Hypnosis can be an effective treatment for many conditions, and there have been attempts to develop instrumental approaches to continuously monitor hypnotic state level ("depth"). However, there is no method that addresses the individual variability of electrophysiological hypnotic correlates. We explore the possibility of using an EEG-based passive brain-computer interface (pBCI) for real-time, individualised estimation of the hypnosis deepening process. RESULTS The wakefulness and deep hypnosis intervals were manually defined and labelled in 27 electroencephalographic (EEG) recordings obtained from eight outpatients after hypnosis sessions. Spectral analysis showed that EEG correlates of deep hypnosis were relatively stable in each patient throughout the treatment but varied between patients. Data from each first session was used to train classification models to continuously assess deep hypnosis probability in subsequent sessions. Models trained using four frequency bands (1.5-45, 1.5-8, 1.5-14, and 4-15 Hz) showed accuracy mostly exceeding 85% in a 10-fold cross-validation. Real-time classification accuracy was also acceptable, so at least one of the four bands yielded results exceeding 74% in any session. The best results averaged across all sessions were obtained using 1.5-14 and 4-15 Hz, with an accuracy of 82%. The revealed issues are also discussed.
Collapse
Affiliation(s)
- Nikita V Obukhov
- Research Department, The Association of Experts in the Field of Clinical Hypnosis, 40, Kamennoostrovsky Ave., 410, Saint Petersburg, 197022, Russian Federation.
- Department of Psychotherapy, Academician I.P. Pavlov First St. Petersburg State Medical University, 6-8, L. Tolstoy str, Saint Petersburg, 197022, Russian Federation.
| | - Peter L N Naish
- Department of Psychology, The Open University, Walton Hall, Milton Keynes, MK7 6AA, UK
| | - Irina E Solnyshkina
- Department of Psychotherapy, Academician I.P. Pavlov First St. Petersburg State Medical University, 6-8, L. Tolstoy str, Saint Petersburg, 197022, Russian Federation
| | - Tatiana G Siourdaki
- Research Department, The Association of Experts in the Field of Clinical Hypnosis, 40, Kamennoostrovsky Ave., 410, Saint Petersburg, 197022, Russian Federation
| | - Ilya A Martynov
- Research Department, The Association of Experts in the Field of Clinical Hypnosis, 40, Kamennoostrovsky Ave., 410, Saint Petersburg, 197022, Russian Federation
| |
Collapse
|
15
|
Bellamoli F, Di Iorio M, Vian M, Melgani F. Machine learning methods for anomaly classification in wastewater treatment plants. J Environ Manage 2023; 344:118594. [PMID: 37473555 DOI: 10.1016/j.jenvman.2023.118594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 06/21/2023] [Accepted: 07/03/2023] [Indexed: 07/22/2023]
Abstract
Modern wastewater treatment plants base their biological processes on advanced control systems which ensure compliance with discharge limits and minimize energy consumption responding to information from on-line probes. The correct readings of probes are particularly crucial for intermittent aeration controllers, which rely on real-time measurements of ammonia and oxygen in biological tanks. These data are also an important resource for developing artificial intelligence algorithms that can identify process or sensor anomalies, thus guiding the choices of plant operators and automatic process controllers. However, using anomaly detection and classification algorithms in real-time wastewater treatment is challenging because of the noisy nature of sensor measurements, the difficulty of obtaining labeled real-plant data, and the complex and interdependent mechanisms that govern biological processes. This work aims at thoroughly exploring the performance of machine learning methods in detecting and classifying the main anomalies in plants operating with intermittent aeration. Using oxygen, ammonia and aeration power measurements from a set of plants in Italy, we perform both binary and multiclass classification, and we compare them through a rigorous validation procedure that includes a test on an unknown dataset, proposing a new evaluation protocol. The classification methods explored are support vector machine, multilayer perceptron, random forest, and two gradient boosting methods (LightGBM and XGBoost). The best performance was achieved using the gradient boosting ensemble algorithms, with up to 96% of anomalies detected and up to 84% and 62% of anomalies classified correctly on the first and second datasets respectively.
Collapse
Affiliation(s)
- Francesca Bellamoli
- University of Trento, Department of Information Engineering and Computer Science, via Sommarive 9, Trento, 38123, Italy; ETC Sustainable Solutions Srl, via dei Palustei 16, Trento, 38121, Italy.
| | | | - Marco Vian
- ETC Sustainable Solutions Srl, via dei Palustei 16, Trento, 38121, Italy
| | - Farid Melgani
- University of Trento, Department of Information Engineering and Computer Science, via Sommarive 9, Trento, 38123, Italy
| |
Collapse
|
16
|
Gonçalves DM, Henriques R, Costa RS. Predicting metabolic fluxes from omics data via machine learning: Moving from knowledge-driven towards data-driven approaches. Comput Struct Biotechnol J 2023; 21:4960-4973. [PMID: 37876626 PMCID: PMC10590844 DOI: 10.1016/j.csbj.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/01/2023] [Accepted: 10/01/2023] [Indexed: 10/26/2023] Open
Abstract
The accurate prediction of phenotypes in microorganisms is a main challenge for systems biology. Genome-scale models (GEMs) are a widely used mathematical formalism for predicting metabolic fluxes using constraint-based modeling methods such as flux balance analysis (FBA). However, they require prior knowledge of the metabolic network of an organism and appropriate objective functions, often hampering the prediction of metabolic fluxes under different conditions. Moreover, the integration of omics data to improve the accuracy of phenotype predictions in different physiological states is still in its infancy. Here, we present a novel approach for predicting fluxes under various conditions. We explore the use of supervised machine learning (ML) models using transcriptomics and/or proteomics data and compare their performance against the standard parsimonious FBA (pFBA) approach using case studies of Escherichia coli organism as an example. Our results show that the proposed omics-based ML approach is promising to predict both internal and external metabolic fluxes with smaller prediction errors in comparison to the pFBA approach. The code, data, and detailed results are available at the project's repository[1].
Collapse
Affiliation(s)
- Daniel M. Gonçalves
- INESC-ID, Rua Alves Redol, 9, Lisbon, 1000-029, Portugal
- Instituto Superior Técnico, Av. Rovisco Pais, 1, Lisbon, 1049-001, Portugal
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, 2829-516, Portugal
| | - Rui Henriques
- INESC-ID, Rua Alves Redol, 9, Lisbon, 1000-029, Portugal
- Instituto Superior Técnico, Av. Rovisco Pais, 1, Lisbon, 1049-001, Portugal
| | - Rafael S. Costa
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, 2829-516, Portugal
| |
Collapse
|
17
|
Sun Z, Yuan Y, Dong X, Liu Z, Cai K, Cheng W, Wu J, Qiao Z, Chen A. Supervised machine learning: A new method to predict the outcomes following exercise intervention in children with autism spectrum disorder. Int J Clin Health Psychol 2023; 23:100409. [PMID: 37711468 PMCID: PMC10498172 DOI: 10.1016/j.ijchp.2023.100409] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 08/22/2023] [Indexed: 09/16/2023] Open
Abstract
The individual differences among children with autism spectrum disorder (ASD) may make it challenging to achieve comparable benefits from a specific exercise intervention program. A new method for predicting the possible outcomes and maximizing the benefits of exercise intervention for children with ASD needs further exploration. Using the mini-basketball training program (MBTP) studies to improve the symptom performance of children with ASD as an example, we used the supervised machine learning method to predict the possible intervention outcomes based on the individual differences of children with ASD, investigated and validated the efficacy of this method. In a long-term study, we included 41 ASD children who received the MBTP. Before the intervention, we collected their clinical information, behavioral factors, and brain structural indicators as candidate factors. To perform the regression and classification tasks, the random forest algorithm from the supervised machine learning method was selected, and the cross validation method was used to determine the reliability of the prediction results. The regression task was used to predict the social communication impairment outcome following the MBTP in children with ASD, and explainable variance was used to evaluate the predictive performance. The classification task was used to distinguish the core symptom outcome groups of ASD children, and predictive performance was assessed based on accuracy. We discovered that random forest models could predict the outcome of social communication impairment (average explained variance was 30.58%) and core symptom (average accuracy was 66.12%) following the MBTP, confirming that the supervised machine learning method can predict exercise intervention outcomes for children with ASD. Our findings provide a novel and reliable method for identifying ASD children most likely to benefit from a specific exercise intervention program in advance and a solid foundation for establishing a personalized exercise intervention program recommendation system for ASD children.
Collapse
Affiliation(s)
- Zhiyuan Sun
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Yunhao Yuan
- School of Information Engineering, Yangzhou University, Yangzhou 225127, China
| | - Xiaoxiao Dong
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Zhimei Liu
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Kelong Cai
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Wei Cheng
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Jingjing Wu
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Zhiyuan Qiao
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
| | - Aiguo Chen
- College of Physical Education, Yangzhou University, Yangzhou 225127, China
- Institute of Sports, Exercise and Brain, Yangzhou University, Yangzhou 225127, China
- Nanjing Institute of Physical Education, Nanjing 210014, China
| |
Collapse
|
18
|
Schellenberg CM, Lindholz M, Grunow JJ, Boie S, Bald A, Warner LO, Ulm B, Milnik A, Zickler D, Angermair S, Reißhauer A, Witzenrath M, Menk M, Balzer F, Ocker T, Weber-Carstens S, Schaller SJ. Mobilisation practices during the SARS-CoV-2 pandemic: A retrospective analysis (MobiCOVID). Anaesth Crit Care Pain Med 2023; 42:101255. [PMID: 37257753 PMCID: PMC10226277 DOI: 10.1016/j.accpm.2023.101255] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 05/17/2023] [Accepted: 05/24/2023] [Indexed: 06/02/2023]
Abstract
BACKGROUND Corona Virus Disease 2019 (COVID-19) patients display risk factors for intensive care unit acquired weakness (ICUAW). The pandemic increased existing barriers to mobilisation. This study aimed to compare mobilisation practices in COVID-19 and non-COVID-19 patients. METHODS This retrospective cohort study was conducted at Charité-Universitätsmedizin Berlin, Germany, including adult patients admitted to one of 16 ICUs between March 2018, and November 2021. The effect of COVID-19 on mobilisation level and frequency, early mobilisation (EM) and time to active sitting position (ASP) was analysed. Subgroup analysis on COVID-19 patients and the ICU type influencing mobilisation practices was performed. Mobilisation entries were converted into the ICU mobility scale (IMS) using supervised machine learning. The groups were matched using 1:1 propensity score matching. RESULTS A total of 12,462 patients were included, receiving 59,415 mobilisations. After matching 611 COVID-19 and non-COVID-19 patients were analysed. They displayed no significant difference in mobilisation frequency (0.4 vs. 0.3, p = 0.7), maximum IMS (3 vs. 3; p = 0.17), EM (43.2% vs. 37.8%; p = 0.06) or time to ASP (HR 0.95; 95% CI: 0.82, 1.09; p = 0.44). Subgroup analysis showed that patients in surge ICUs, i.e., temporarily created ICUs for COVID-19 patients during the pandemic, more commonly received EM (53.9% vs. 39.8%; p = 0.03) and reached higher maximum IMS (4 vs. 3; p = 0.03) without difference in mobilisation frequency (0.5 vs. 0.3; p = 0.32) or time to ASP (HR 1.15; 95% CI: 0.85, 1.56; p = 0.36). CONCLUSION COVID-19 did not hinder mobilisation. Those treated in surge ICUs were more likely to receive EM and reached higher mobilisation levels.
Collapse
Affiliation(s)
- Clara M Schellenberg
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany
| | - Maximilian Lindholz
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany
| | - Julius J Grunow
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany
| | - Sebastian Boie
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Berlin, Germany
| | - Annika Bald
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany
| | - Linus O Warner
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany
| | - Bernhard Ulm
- Technical University of Munich, School of Medicine, Department of Anesthesiology and Intensive Care, Munich, Germany; Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Ulm, Germany
| | - Annette Milnik
- Research Platform Molecular and Cognitive Neurosciences (MCN), Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Daniel Zickler
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Nephrology and Medical Intensive Care, Berlin, Germany
| | - Stefan Angermair
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine (CBF), Berlin, Germany
| | - Anett Reißhauer
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Rehabilitation Medicine, Berlin, Germany
| | - Martin Witzenrath
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Infectious Diseases, Pulmonary Medicine and Critical Care, Berlin, Germany; German Center for Lung Research (DZL), Berlin, Germany
| | - Mario Menk
- Department of Anesthesiology and Intensive Care Medicine, University Hospital "Carl Gustav Carus", Technische Universität Dresden, Dresden, Germany
| | - Felix Balzer
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Berlin, Germany
| | - Thomas Ocker
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany; Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Berlin, Germany
| | - Steffen Weber-Carstens
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany
| | - Stefan J Schaller
- Charité - Universitätsmedizin Berlin, Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Anesthesiology and Intensive Care Medicine | CCM | CVK, Berlin, Germany; Technical University of Munich, School of Medicine, Department of Anesthesiology and Intensive Care, Munich, Germany.
| |
Collapse
|
19
|
Chen YL, Kraus SW, Freeman MJ, Freeman AJ. A Machine-Learning Approach to Assess Factors Associated With Hospitalization of Children and Youths in Psychiatric Crisis. Psychiatr Serv 2023; 74:943-949. [PMID: 36916060 DOI: 10.1176/appi.ps.20220201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
OBJECTIVE The authors used a machine-learning approach to model clinician decision making regarding psychiatric hospitalization of children and youths in crisis and to identify factors associated with the decision to hospitalize. METHODS Data consisted of 4,786 mobile crisis response team assessments of children and youths, ages 4.0-19.5 years (mean±SD=14.0±2.7 years, 56% female), in Nevada. The sample assessments were split into training and testing data sets. A random-forest machine-learning algorithm was used to identify variables related to the decision to hospitalize a child or youth after the crisis assessment. Results from the training sample were externally validated in the testing sample. RESULTS The random-forest model had good performance (area under the curve training sample=0.91, testing sample=0.92). Variables found to be important in the decision to hospitalize a child or youth were acute suicidality, followed by poor judgment or decision making, danger to others, impulsivity, runaway behavior, other risky behaviors, nonsuicidal self-injury, psychotic or depressive symptoms, sleep problems, oppositional behavior, poor functioning at home or with peers, depressive or schizophrenia spectrum disorders, and age. CONCLUSIONS In crisis settings, clinicians were found to mostly focus on acute factors that increased risk for danger to self or others (e.g., suicidality, poor judgment), current psychiatric symptoms (e.g., psychotic symptoms), and functioning (e.g., poor home functioning, problems with peer relationships) when deciding whether to hospitalize or stabilize a child or youth. To reduce psychiatric hospitalization, community-based services should target interventions to address these important factors associated with the need for a higher level of care among youths in psychiatric crisis.
Collapse
Affiliation(s)
- Yen-Ling Chen
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas (Chen, Kraus); Boys and Girls Clubs of Southern Nevada, Las Vegas (M. J. Freeman); Inspiring Children Foundation, Las Vegas (A. J. Freeman)
| | - Shane W Kraus
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas (Chen, Kraus); Boys and Girls Clubs of Southern Nevada, Las Vegas (M. J. Freeman); Inspiring Children Foundation, Las Vegas (A. J. Freeman)
| | - Megan J Freeman
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas (Chen, Kraus); Boys and Girls Clubs of Southern Nevada, Las Vegas (M. J. Freeman); Inspiring Children Foundation, Las Vegas (A. J. Freeman)
| | - Andrew J Freeman
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas (Chen, Kraus); Boys and Girls Clubs of Southern Nevada, Las Vegas (M. J. Freeman); Inspiring Children Foundation, Las Vegas (A. J. Freeman)
| |
Collapse
|
20
|
Koch RA, Boucsein M, Brons S, Alber M, Bahn E. A time-resolved clonogenic assay for improved cell survival and RBE measurements. Clin Transl Radiat Oncol 2023; 42:100662. [PMID: 37576069 PMCID: PMC10412889 DOI: 10.1016/j.ctro.2023.100662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/20/2023] [Accepted: 07/20/2023] [Indexed: 08/15/2023] Open
Abstract
Purpose The in vitro clonogenic assay (IVCA) is the mainstay of quantitative radiobiology. Here, we investigate the benefit of a time-resolved IVCA version (trIVCA) to improve the quantification of clonogenic survival and relative biological effectiveness (RBE) by analyzing cell colony growth behavior. Materials & Methods In the IVCA, clonogenicity classification of cell colonies is performed based on a fixed colony size threshold after incubation. In contrast, using trIVCA, we acquire time-lapse microscopy images during incubation and track the growth of each colony using neural-net-based image segmentation. Attributes of the resulting growth curves are then used as predictors for a decision tree classifier to determine clonogenicity of each colony. The method was applied to three cell lines, each irradiated with 250 kV X-rays in the range 0-8 Gy and carbon ions of high LET (100 keV/μm, dose-averaged) in the range 0-2 Gy. We compared the cell survival curves determined by trIVCA to those from the classical IVCA across different size thresholds and incubation times. Further, we investigated the impact of the assaying method on RBE determination. Results Size distributions of abortive and clonogenic colonies overlap consistently, rendering perfect separation via size threshold unfeasible at any readout time. This effect is dose-dependent, systematically inflating the steepness and curvature of cell survival curves. Consequently, resulting cell survival estimates show variability between 3% and 105%. This uncertainty propagates into RBE calculation with variability between 8% and 25% at 2 Gy.Determining clonogenicity based on growth curves has an accuracy of 95% on average. Conclusion The IVCA suffers from substantial uncertainty caused by the overlap of size distributions of delayed abortive and clonogenic colonies. This impairs precise quantification of cell survival and RBE. By considering colony growth over time, our method improves assaying clonogenicity.
Collapse
Affiliation(s)
- Robin A Koch
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 672, 69120 Heidelberg, Germany
- Heidelberg Institute of Radiation Oncology (HIRO), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
- Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Marc Boucsein
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 672, 69120 Heidelberg, Germany
- Heidelberg Institute of Radiation Oncology (HIRO), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
- Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Stephan Brons
- Heidelberg Institute of Radiation Oncology (HIRO), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- Heidelberg Ion-Beam Therapy Center (HIT), Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 450, 69120 Heidelberg, Germany
| | - Markus Alber
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 672, 69120 Heidelberg, Germany
- Heidelberg Institute of Radiation Oncology (HIRO), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
| | - Emanuel Bahn
- Department of Radiation Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 672, 69120 Heidelberg, Germany
- Heidelberg Institute of Radiation Oncology (HIRO), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany
- Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| |
Collapse
|
21
|
Cui C, Li Y, Liu S, Wang P, Huang Z. The un supervised machine learning to analyze the use strategy of statins for ischaemic stroke patients with elevated transaminase. Clin Neurol Neurosurg 2023; 232:107900. [PMID: 37478641 DOI: 10.1016/j.clineuro.2023.107900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 07/02/2023] [Accepted: 07/15/2023] [Indexed: 07/23/2023]
Abstract
BACKGROUND AND PURPOSE Statins could elevate hepatic transaminase in ischemic stroke patients. There needed to be more evidence on which method stopped statins or adjusting the dose of statins was better for patients. And no evidence showed which way more suit for some patients. METHODS We collected ischaemic stroke patients with elevated hepatic transaminase when they take statins. The outcome was a recurrent stroke rate, transaminase value after stopping or adjusted, mortality, and favorable functional outcome (FFO). We compare outcome events between the stopped group and the adjustment group. We grouped all patients by unsupervised machine learning and analyzed data characters by the different groups. RESULTS The patients stopping statins had a higher stroke recurrence and rate of FFO (mRS 0-2), a lower mean value of transaminase, and mortality. By difference unsupervised machine learning group, the km2 group had the lowest stroke recurrence (p = 0.046), lowest mortality (p = 0.049), and highest FFO (p = 0.023). The patients of the km2 group were younger (p < 0.001), more male (p < 0.001), had lesser National Institutes of Health Stroke Scale (NIHSS) scores (p < 0.001), and had slightly higher values of blood pressure (p = 0.002). The group of unsupervised machine learning could improve models' performance. CONCLUSION For ischemic patients with elevated hepatic transaminase, stopping statins temporarily was a better choice of treatment strategy. These patients who were younger, male, with a lesser NIHSS score at admission and a slightly higher blood lipid value at admission, could have had a better prognosis.
Collapse
Affiliation(s)
- Chaohua Cui
- Affiliated Hospital of Youjiang Medical University for Nationalities, Youjiang District, Baise, Guangxi, China.
| | - Yuchuan Li
- Affiliated Liutie Central Hospital of Guangxi Medical University, Liunan District, Liuzhou, Guangxi, China
| | - Shaohui Liu
- Affiliated Liutie Central Hospital of Guangxi Medical University, Liunan District, Liuzhou, Guangxi, China
| | - Ping Wang
- Affiliated Primary School Liugong Middle School, Liunan District, Liuzhou, Guangxi, China
| | - Zhonghua Huang
- Affiliated Liutie Central Hospital of Guangxi Medical University, Liunan District, Liuzhou, Guangxi, China
| |
Collapse
|
22
|
Pielsticker L, Nicholls RL, DeBeer S, Greiner M. Convolutional neural network framework for the automated analysis of transition metal X-ray photoelectron spectra. Anal Chim Acta 2023; 1271:341433. [PMID: 37328241 DOI: 10.1016/j.aca.2023.341433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 05/15/2023] [Accepted: 05/26/2023] [Indexed: 06/18/2023]
Abstract
X-ray photoelectron spectroscopy is an indispensable technique for the quantitative determination of sample composition and electronic structure in diverse research fields. Quantitative analysis of the phases present in XP spectra is usually conducted manually by means of empirical peak fitting performed by trained spectroscopists. However, with recent advancements in the usability and reliability of XPS instruments, ever more (inexperienced) users are creating increasingly large data sets that are harder to analyze by hand. In order to aid users with the analysis of large XPS data sets, more automated, easy-to-use analysis techniques are needed. Here, we propose a supervised machine learning framework based on artificial convolutional neural networks. By training such networks on large numbers of artificially created XP spectra with known quantifications (i.e., for each spectrum, the concentration of each chemical species is known), we created universally applicable models for auto-quantification of transition-metal XPS data that are able to predict the sample composition from spectra within seconds. Upon evaluation against more traditional peak fitting methods, we showed that these neural networks achieve competitive quantification accuracy. The proposed framework is shown to be flexible enough to accommodate spectra containing multiple chemical elements and measured with different experimental parameters. The use of dropout variational inference for the determination of quantification uncertainty is illustrated.
Collapse
Affiliation(s)
- Lukas Pielsticker
- Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, 45470, Muelheim an der Ruhr, Germany.
| | - Rachel L Nicholls
- Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, 45470, Muelheim an der Ruhr, Germany
| | - Serena DeBeer
- Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, 45470, Muelheim an der Ruhr, Germany
| | - Mark Greiner
- Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, 45470, Muelheim an der Ruhr, Germany
| |
Collapse
|
23
|
Nakayama LF, Zago Ribeiro L, de Oliveira JAE, de Matos JCRG, Mitchell WG, Malerbi FK, Celi LA, Regatieri CVS. Fairness and generalizability of OCT normative databases: a comparative analysis. Int J Retina Vitreous 2023; 9:48. [PMID: 37605208 PMCID: PMC10440930 DOI: 10.1186/s40942-023-00459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/26/2023] [Indexed: 08/23/2023] Open
Abstract
PURPOSE In supervised Machine Learning algorithms, labels and reports are important in model development. To provide a normality assessment, the OCT has an in-built normative database that provides a color base scale from the measurement database comparison. This article aims to evaluate and compare normative databases of different OCT machines, analyzing patient demographic, contrast inclusion and exclusion criteria, diversity index, and statistical approach to assess their fairness and generalizability. METHODS Data were retrieved from Cirrus, Avanti, Spectralis, and Triton's FDA-approval and equipment manual. The following variables were compared: number of eyes and patients, inclusion and exclusion criteria, statistical approach, sex, race and ethnicity, age, participant country, and diversity index. RESULTS Avanti OCT has the largest normative database (640 eyes). In every database, the inclusion and exclusion criteria were similar, including adult patients and excluding pathological eyes. Spectralis has the largest White (79.7%) proportionately representation, Cirrus has the largest Asian (24%), and Triton has the largest Black (22%) patient representation. In all databases, the statistical analysis applied was Regression models. The sex diversity index is similar in all datasets, and comparable to the ten most populous contries. Avanti dataset has the highest diversity index in terms of race, followed by Cirrus, Triton, and Spectralis. CONCLUSION In all analyzed databases, the data framework is static, with limited upgrade options and lacking normative databases for new modules. As a result, caution in OCT normality interpretation is warranted. To address these limitations, there is a need for more diverse, representative, and open-access datasets that take into account patient demographics, especially considering the development of supervised Machine Learning algorithms in healthcare.
Collapse
Affiliation(s)
- Luis Filipe Nakayama
- Laboratory of Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America.
- Department of Ophthalmology, São Paulo Federal University, Sao Paulo, SP, Brazil.
| | - Lucas Zago Ribeiro
- Department of Ophthalmology, São Paulo Federal University, Sao Paulo, SP, Brazil
| | | | - João Carlos Ramos Gonçalves de Matos
- Laboratory of Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America
- University of Porto, Porto, Portugal
| | | | | | - Leo Anthony Celi
- Laboratory of Computational Physiology, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA, 02139, United States of America
- Department of Biostatistics, United States of America, Harvard TH Chan School of Public Health, Boston, MA, United States of America
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States of America
| | | |
Collapse
|
24
|
Chicco D, Jurman G. A statistical comparison between Matthews correlation coefficient (MCC), prevalence threshold, and Fowlkes-Mallows index. J Biomed Inform 2023; 144:104426. [PMID: 37352899 DOI: 10.1016/j.jbi.2023.104426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/09/2023] [Accepted: 06/15/2023] [Indexed: 06/25/2023]
Abstract
Even if assessing binary classifications is a common task in scientific research, no consensus on a single statistic summarizing the confusion matrix has been reached so far. In recent studies, we demonstrated the advantages of the Matthews correlation coefficient (MCC) over other popular rates such as cross-entropy error, F1 score, accuracy, balanced accuracy, bookmaker informedness, diagnostic odds ratio, Brier score, and Cohen's kappa. In this study, we compared the MCC to other two statistics: prevalence threshold (PT), frequently used in obstetrics and gynecology, and Fowlkes-Mallows index, a metric employed in fuzzy logic and drug discovery. Through the investigation of the mutual relations among three metrics and the study of some relevant use cases, we show that, when positive data elements and negative data elements have the same importance, the Matthews correlation coefficient can be more informative than its two competitors, even this time.
Collapse
|
25
|
Reis FJJ, Bittencourt JV, Calestini L, de Sá Ferreira A, Meziat-Filho N, Nogueira LC. Exploratory analysis of 5 supervised machine learning models for predicting the efficacy of the endogenous pain inhibitory pathway in patients with musculoskeletal pain. Musculoskelet Sci Pract 2023; 66:102788. [PMID: 37315499 DOI: 10.1016/j.msksp.2023.102788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/09/2023] [Accepted: 06/05/2023] [Indexed: 06/16/2023]
Abstract
OBJECTIVES The identification of factors that influence the efficacy of endogenous pain inhibitory pathways remains challenging due to different protocols and populations. We explored five machine learning (ML) models to estimate the Conditioned Pain Modulation (CPM) efficacy. DESIGN Exploratory, cross-sectional design. SETTING AND PARTICIPANTS This study was conducted in an outpatient setting and included 311 patients with musculoskeletal pain. METHODS Data collection included sociodemographic, lifestyle, and clinical characteristics. CPM efficacy was calculated by comparing the pressure pain thresholds before and after patients submerged their non-dominant hand in a bucket of cold water (cold-pressure test) (1-4 °C). We developed five ML models: decision tree, random forest, gradient-boosted trees, logistic regression, and support vector machine. MAIN OUTCOME MEASURES Model performance were assessed using receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, recall, F1-score, and the Matthews Correlation Coefficient (MCC). To interpret and explain the predictions, we used SHapley Additive explanation values and Local Interpretable Model-Agnostic Explanations. RESULTS The XGBoost model presented the highest performance with an accuracy of 0.81 (95% CI = 0.73 to 0.89), F1 score of 0.80 (95% CI = 0.74 to 0.87), AUC of 0.81 (95% CI: 0.74 to 0.88), MCC of 0.61, and Kappa of 0.61. The model was influenced by duration of pain, fatigue, physical activity, and the number of painful areas. CONCLUSIONS XGBoost showed potential in predicting the CPM efficacy in patients with musculoskeletal pain on our dataset. Further research is needed to ensure the external validity and clinical utility of this model.
Collapse
Affiliation(s)
- Felipe J J Reis
- Physical Therapy Department, Instituto Federal do Rio de Janeiro (IFRJ), Rio de Janeiro, Brazil; Postgraduate Program in Clinical Medicine, Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil; . Pain in Motion Research Group, Department of Physiotherapy, Human Physiology and Anatomy, Faculty of Physical Education & Physiotherapy, Vrije Universiteit Brussel, Brussels, Belgium.
| | - Juliana Valentim Bittencourt
- Postgraduate Program in Rehabilitation Sciences, Centro Universitário Augusto Motta (UNISUAM), Rio de Janeiro, Brazil
| | | | - Arthur de Sá Ferreira
- Postgraduate Program in Rehabilitation Sciences, Centro Universitário Augusto Motta (UNISUAM), Rio de Janeiro, Brazil
| | - Ney Meziat-Filho
- Postgraduate Program in Rehabilitation Sciences, Centro Universitário Augusto Motta (UNISUAM), Rio de Janeiro, Brazil
| | - Leandro C Nogueira
- Physical Therapy Department, Instituto Federal do Rio de Janeiro (IFRJ), Rio de Janeiro, Brazil; Postgraduate Program in Rehabilitation Sciences, Centro Universitário Augusto Motta (UNISUAM), Rio de Janeiro, Brazil
| |
Collapse
|
26
|
Beneyto M, Ghyaza G, Cariou E, Amar J, Lairez O. Development and validation of machine learning algorithms to predict posthypertensive origin in left ventricular hypertrophy. Arch Cardiovasc Dis 2023; 116:397-402. [PMID: 37474391 DOI: 10.1016/j.acvd.2023.06.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 06/12/2023] [Accepted: 06/19/2023] [Indexed: 07/22/2023]
Abstract
BACKGROUND Left ventricular hypertrophy is often associated with hypertension, which is not necessarily the cause of hypertrophy. Non-hypertension-related aetiologies often have a strong impact on patient management, and therefore require a thorough and careful workup. When considering all left ventricular hypertrophies, even the mild ones, the number of patients who need a workup increases drastically. This raises the need for a tool to evaluate the pretest probability of the origin of left ventricular hypertrophy. AIM To predict the hypertensive origin of left ventricular hypertrophy using machine learning on first-line clinical, laboratory and echocardiographic variables. METHODS We used a retrospective single-centre population of 591 patients with left ventricular hypertrophy, starting at 12mm maximal left ventricular wall thickness. After splitting data in a training and testing set, we trained three different algorithms: decision tree; random forest; and support vector machine. Model performances were validated on the testing set. RESULTS All models exhibited good areas under receiver operating characteristic curves: 0.82 (95% confidence interval: 0.77-0.88) for the decision tree; 0.90 (95% confidence interval 0.85-0.94) for the random forest; and 0.90 (95% confidence interval: 0.85-0.94) for the support vector machine. After threshold selection, the last model had the best balance between its specificity of 0.96 (95% confidence interval: 0.91-0.99) and its sensitivity of 0.31 (95% confidence interval: 0.17-0.44). All algorithms relied on similar most influential predictor variables. Online calculators were developed and made publicly available. CONCLUSIONS Machine learning models were able to determine the hypertensive origin of left ventricular hypertrophy with good performances. Implementation in clinical practice could reduce the number of aetiological workups needed in patients presenting with left ventricular hypertrophy.
Collapse
Affiliation(s)
- Maxime Beneyto
- Cardiac Imaging Centre, Toulouse University Hospital, 31059 Toulouse, France.
| | - Ghada Ghyaza
- Department of Hypertension, Toulouse University Hospital, 31059 Toulouse, France
| | - Eve Cariou
- Cardiac Imaging Centre, Toulouse University Hospital, 31059 Toulouse, France
| | - Jacques Amar
- Department of Hypertension, Toulouse University Hospital, 31059 Toulouse, France
| | - Olivier Lairez
- Cardiac Imaging Centre, Toulouse University Hospital, 31059 Toulouse, France
| |
Collapse
|
27
|
Goossens Q, Locsin M, Gharehbaghi S, Brito P, Moise E, Ponder LA, Inan OT, Prahalad S. Knee acoustic emissions as a noninvasive biomarker of articular health in patients with juvenile idiopathic arthritis: a clinical validation in an extended study population. Pediatr Rheumatol Online J 2023; 21:59. [PMID: 37340311 DOI: 10.1186/s12969-023-00842-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/03/2023] [Indexed: 06/22/2023] Open
Abstract
BACKGROUND Joint acoustic emissions from knees have been evaluated as a convenient, non-invasive digital biomarker of inflammatory knee involvement in a small cohort of children with Juvenile Idiopathic Arthritis (JIA). The objective of the present study was to validate this in a larger cohort. FINDINGS A total of 116 subjects (86 JIA and 30 healthy controls) participated in this study. Of the 86 subjects with JIA, 43 subjects had active knee involvement at the time of study. Joint acoustic emissions were bilaterally recorded, and corresponding signal features were used to train a machine learning algorithm (XGBoost) to classify JIA and healthy knees. All active JIA knees and 80% of the controls were used as training data set, while the remaining knees were used as testing data set. Leave-one-leg-out cross-validation was used for validation on the training data set. Validation on the training and testing set of the classifier resulted in an accuracy of 81.1% and 87.7% respectively. Sensitivity / specificity for the training and testing validation was 88.6% / 72.3% and 88.1% / 83.3%, respectively. The area under the curve of the receiver operating characteristic curve was 0.81 for the developed classifier. The distributions of the joint scores of the active and inactive knees were significantly different. CONCLUSION Joint acoustic emissions can serve as an inexpensive and easy-to-use digital biomarker to distinguish JIA from healthy controls. Utilizing serial joint acoustic emission recordings can potentially help monitor disease activity in JIA affected joints to enable timely changes in therapy.
Collapse
Affiliation(s)
- Quentin Goossens
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Technology Square Research Building, 85 Fifth St NW, Atlanta, GA, 30308, USA.
| | - Miguel Locsin
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, 30223, USA
| | - Sevda Gharehbaghi
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Technology Square Research Building, 85 Fifth St NW, Atlanta, GA, 30308, USA
| | - Priya Brito
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, 30223, USA
| | - Emily Moise
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Technology Square Research Building, 85 Fifth St NW, Atlanta, GA, 30308, USA
| | - Lori A Ponder
- Children's Healthcare of Atlanta, Atlanta, GA, 30223, USA
| | - Omer T Inan
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Technology Square Research Building, 85 Fifth St NW, Atlanta, GA, 30308, USA
| | - Sampath Prahalad
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, 30223, USA
- Children's Healthcare of Atlanta, Atlanta, GA, 30223, USA
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30223, USA
| |
Collapse
|
28
|
Leontidou K, Rubel V, Stoeck T. Comparing quantile regression spline analyses and supervised machine learning for environmental quality assessment at coastal marine aquaculture installations. PeerJ 2023; 11:e15425. [PMID: 37334127 PMCID: PMC10274583 DOI: 10.7717/peerj.15425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 04/25/2023] [Indexed: 06/20/2023] Open
Abstract
Organic enrichment associated with marine finfish aquaculture is a local stressor of marine coastal ecosystems. To maintain ecosystem services, the implementation of biomonitoring programs focusing on benthic diversity is required. Traditionally, impact-indices are determined by extracting and identifying benthic macroinvertebrates from samples. However, this is a time-consuming and expensive method with low upscaling potential. A more rapid, inexpensive, and robust method to infer the environmental quality of marine environments is eDNA metabarcoding of bacterial communities. To infer the environmental quality of coastal habitats from metabarcoding data, two taxonomy-free approaches have been successfully applied for different geographical regions and monitoring goals, namely quantile regression splines (QRS) and supervised machine learning (SML). However, their comparative performance remains untested for monitoring the impact of organic enrichment introduced by aquaculture on marine coastal environments. We compared the performance of QRS and SML using bacterial metabarcoding data to infer the environmental quality of 230 aquaculture samples collected from seven farms in Norway and seven farms in Scotland along an organic enrichment gradient. As a measure of environmental quality, we used the Infaunal Quality Index (IQI) calculated from benthic macrofauna data (reference index). The QRS analysis plotted the abundance of amplicon sequence variants (ASVs) as a function to the IQI from which the ASVs with a defined abundance peak were assigned to eco-groups and a molecular IQI was subsequently calculated. In contrast, the SML approach built a random forest model to directly predict the macrofauna-based IQI. Our results show that both QRS and SML perform well in inferring the environmental quality with 89% and 90% accuracy, respectively. For both geographic regions, there was high correspondence between the reference IQI and both the inferred molecular IQIs (p < 0.001), with the SML model showing a higher coefficient of determination compared to QRS. Among the 20 most important ASVs identified by the SML approach, 15 were congruent with the good quality spline ASV indicators identified via QRS for both Norwegian and Scottish salmon farms. More research on the response of the ASVs to organic enrichment and the co-influence of other environmental parameters is necessary to eventually select the most powerful stressor-specific indicators. Even though both approaches are promising to infer environmental quality based on metabarcoding data, SML showed to be more powerful in handling the natural variability. For the improvement of the SML model, addition of new samples is still required, as background noise introduced by high spatio-temporal variability can be reduced. Overall, we recommend the development of a powerful SML approach that will be onwards applied for monitoring the impact of aquaculture on marine ecosystems based on eDNA metabarcoding data.
Collapse
|
29
|
Huang W, Suominen H, Liu T, Rice G, Salomon C, Barnard AS. Explainable discovery of disease biomarkers: The case of ovarian cancer to illustrate the best practice in machine learning and Shapley analysis. J Biomed Inform 2023; 141:104365. [PMID: 37062419 DOI: 10.1016/j.jbi.2023.104365] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/24/2023] [Accepted: 04/10/2023] [Indexed: 04/18/2023]
Abstract
OBJECTIVE Ovarian cancer is a significant health issue with lasting impacts on the community. Despite recent advances in surgical, chemotherapeutic and radiotherapeutic interventions, they have had only marginal impacts due to an inability to identify biomarkers at an early stage. Biomarker discovery is challenging, yet essential for improving drug discovery and clinical care. Machine learning (ML) techniques are invaluable for recognising complex patterns in biomarkers compared to conventional methods, yet they can lack physical insights into diagnosis. eXplainable Artificial Intelligence (XAI) is capable of providing deeper insights into the decision-making of complex ML algorithms increasing their applicability. We aim to introduce best practice for combining ML and XAI techniques for biomarker validation tasks. METHODS We focused on classification tasks and a game theoretic approach based on Shapley values to build and evaluate models and visualise results. We described the workflow and apply the pipeline in a case study using the CDAS PLCO Ovarian Biomarkers dataset to demonstrate the potential for accuracy and utility. RESULTS The case study results demonstrate the efficacy of the ML pipeline, its consistency, and advantages compared to conventional statistical approaches. CONCLUSION The resulting guidelines provide a general framework for practical application of XAI in medical research that can inform clinicians and validate and explain cancer biomarkers.
Collapse
Affiliation(s)
- Weitong Huang
- School of Computing, Australian National University, Acton, ACT 2601, Australia.
| | - Hanna Suominen
- School of Computing, Australian National University, Acton, ACT 2601, Australia; Department of Computing, University of Turku, Turku, Finland
| | - Tommy Liu
- School of Computing, Australian National University, Acton, ACT 2601, Australia
| | - Gregory Rice
- Exosome Biology Laboratory, Centre for Clinical Diagnostics, University of Queensland Centre for Clinical Research, Royal Brisbane and Women's Hospital, Faculty of Medicine, The University of Queensland, Brisbane, Australia; Inoviq Limited, Notting Hill, Australia
| | - Carlos Salomon
- Exosome Biology Laboratory, Centre for Clinical Diagnostics, University of Queensland Centre for Clinical Research, Royal Brisbane and Women's Hospital, Faculty of Medicine, The University of Queensland, Brisbane, Australia; Translational Extracellular Vesicles in Obstetrics and Gynae-Oncology Group, Centre for Clinical Diagnostics, University of Queensland Centre for Clinical Research, Royal Brisbane and Women's Hospital, Faculty of Medicine, The University of Queensland, Brisbane, Australia
| | - Amanda S Barnard
- School of Computing, Australian National University, Acton, ACT 2601, Australia
| |
Collapse
|
30
|
Abpeikar S, Kasmarik K. Motion behaviour recognition dataset collected from human perception of collective motion behaviour. Data Brief 2023; 47:108976. [PMID: 36875220 PMCID: PMC9975684 DOI: 10.1016/j.dib.2023.108976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/23/2023] [Accepted: 02/07/2023] [Indexed: 02/17/2023] Open
Abstract
Collective motion behaviour such as the movement of swarming bees, flocking birds or schooling fish has inspired computer-based swarming systems. They are widely used in agent formation control, including aerial and ground vehicles, teams of rescue robots, and exploration of dangerous environments with groups of robots. Collective motion behaviour is easy to describe, but highly subjective to detect. Humans can easily recognise these behaviours; however, it is hard for a computer system to recognise them. Since humans can easily recognise these behaviours, ground truth data from human perception is one way to enable machine learning methods to mimic this human perception. Hence ground truth data has been collected from human perception of collective motion behaviour recognition by running an online survey. In this survey, participants provide their opinion about the behaviour of 'boid' point masses. Each question of the survey contains a short video (around 10 seconds), captured from simulated boid movements. Participants were asked to drag a slider to label each video as either 'flocking' or 'not flocking'; 'aligned' or 'not aligned' or 'grouped' or 'not grouped'. By averaging these responses, three binary labels were created for each video. This data has been analysed to confirm that it is possible for a machine to learn binary classification labels from the human perception of collective behaviour dataset with high accuracy.
Collapse
|
31
|
Sieg M, Roselló Atanet I, Tomova MT, Schoeneberg U, Sehy V, Mäder P, März M. Discovering unknown response patterns in progress test data to improve the estimation of student performance. BMC Med Educ 2023; 23:193. [PMID: 36978145 PMCID: PMC10053036 DOI: 10.1186/s12909-023-04172-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 03/17/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND The Progress Test Medizin (PTM) is a 200-question formative test that is administered to approximately 11,000 students at medical universities (Germany, Austria, Switzerland) each term. Students receive feedback on their knowledge (development) mostly in comparison to their own cohort. In this study, we use the data of the PTM to find groups with similar response patterns. METHODS We performed k-means clustering with a dataset of 5,444 students, selected cluster number k = 5, and answers as features. Subsequently, the data was passed to XGBoost with the cluster assignment as target enabling the identification of cluster-relevant questions for each cluster with SHAP. Clusters were examined by total scores, response patterns, and confidence level. Relevant questions were evaluated for difficulty index, discriminatory index, and competence levels. RESULTS Three of the five clusters can be seen as "performance" clusters: cluster 0 (n = 761) consisted predominantly of students close to graduation. Relevant questions tend to be difficult, but students answered confidently and correctly. Students in cluster 1 (n = 1,357) were advanced, cluster 3 (n = 1,453) consisted mainly of beginners. Relevant questions for these clusters were rather easy. The number of guessed answers increased. There were two "drop-out" clusters: students in cluster 2 (n = 384) dropped out of the test about halfway through after initially performing well; cluster 4 (n = 1,489) included students from the first semesters as well as "non-serious" students both with mostly incorrect guesses or no answers. CONCLUSION Clusters placed performance in the context of participating universities. Relevant questions served as good cluster separators and further supported our "performance" cluster groupings.
Collapse
Affiliation(s)
- Miriam Sieg
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany
| | - Iván Roselló Atanet
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany
| | - Mihaela Todorova Tomova
- Fakultät für Informatik und Automatisierung, Data-Intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ehrenbergstraße 29, 98693, Ilmenau, Germany
| | - Uwe Schoeneberg
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany
| | - Victoria Sehy
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany
| | - Patrick Mäder
- Fakultät für Informatik und Automatisierung, Data-Intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ehrenbergstraße 29, 98693, Ilmenau, Germany
- Fakultät für Biowissenschaften, Friedrich Schiller Universität Jena, Schloßgasse 10, 07743, Jena, Germany
| | - Maren März
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany.
| |
Collapse
|
32
|
Hoffman H, Wood JS, Cote JR, Jalal MS, Masoud HE, Gould GC. Machine learning prediction of malignant middle cerebral artery infarction after mechanical thrombectomy for anterior circulation large vessel occlusion. J Stroke Cerebrovasc Dis 2023; 32:106989. [PMID: 36652789 DOI: 10.1016/j.jstrokecerebrovasdis.2023.106989] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 01/08/2023] [Accepted: 01/09/2023] [Indexed: 01/18/2023] Open
Abstract
OBJECTIVE Prediction of malignant middle cerebral artery infarction (MMI) could identify patients for early intervention. We trained and internally validated a ML model that predicts MMI following mechanical thrombectomy (MT) for ACLVO. METHODS All patients who underwent MT for ACLVO between 2015 - 2021 at a single institution were reviewed. Data was divided into 80% training and 20% test sets. 10 models were evaluated on the training set. The top 3 models underwent hyperparameter tuning using grid search with nested 5-fold CV to optimize the area under the receiver operating curve (AUROC). Tuned models were evaluated on the test set and compared to logistic regression. RESULTS A total of 381 patients met the inclusion criteria. There were 50 (13.1%) patients who developed MMI. Out of the 10 ML models screened on the training set, the top 3 performing were neural network (median AUROC 0.78, IQR 0.72 - 0.83), support vector machine ([SVM] median AUROC 0.77, IQR 0.72 - 0.83), and random forest (median AUROC 0.75, IQR 0.68 - 0.81). On the test set, random forest (median AUROC 0.78, IQR 0.73 - 0.83) and neural network (median AUROC 0.78, IQR 0.73 - 0.83) were the top performing models, followed by SVM (median AUROC 0.77, IQR 0.70 - 0.83). These scores were significantly better than those for logistic regression (AUROC 0.72, IQR 0.66 - 0.78), individual risk factors, and the Malignant Brain Edema score (p < 0.001 for all). CONCLUSION ML models predicted MMI with good discriminative ability. They outperformed standard statistical techniques and individual risk factors.
Collapse
Affiliation(s)
- Haydn Hoffman
- Department of Neurosurgery, State University of New York Upstate Medical University, Syracuse, NY, USA.
| | - Jacob S Wood
- Department of Neurosurgery, State University of New York Upstate Medical University, Syracuse, NY, USA
| | - John R Cote
- Department of Neurosurgery, State University of New York Upstate Medical University, Syracuse, NY, USA
| | - Muhammad S Jalal
- Department of Neurosurgery, State University of New York Upstate Medical University, Syracuse, NY, USA
| | - Hesham E Masoud
- Department of Neurology, State University of New York Upstate Medical University, Syracuse, NY, USA
| | - Grahame C Gould
- Department of Neurosurgery, State University of New York Upstate Medical University, Syracuse, NY, USA
| |
Collapse
|
33
|
Kantidakis G, Putter H, Litière S, Fiocco M. Statistical models versus machine learning for competing risks: development and validation of prognostic models. BMC Med Res Methodol 2023; 23:51. [PMID: 36829145 PMCID: PMC9951458 DOI: 10.1186/s12874-023-01866-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND In health research, several chronic diseases are susceptible to competing risks (CRs). Initially, statistical models (SM) were developed to estimate the cumulative incidence of an event in the presence of CRs. As recently there is a growing interest in applying machine learning (ML) for clinical prediction, these techniques have also been extended to model CRs but literature is limited. Here, our aim is to investigate the potential role of ML versus SM for CRs within non-complex data (small/medium sample size, low dimensional setting). METHODS A dataset with 3826 retrospectively collected patients with extremity soft-tissue sarcoma (eSTS) and nine predictors is used to evaluate model-predictive performance in terms of discrimination and calibration. Two SM (cause-specific Cox, Fine-Gray) and three ML techniques are compared for CRs in a simple clinical setting. ML models include an original partial logistic artificial neural network for CRs (PLANNCR original), a PLANNCR with novel specifications in terms of architecture (PLANNCR extended), and a random survival forest for CRs (RSFCR). The clinical endpoint is the time in years between surgery and disease progression (event of interest) or death (competing event). Time points of interest are 2, 5, and 10 years. RESULTS Based on the original eSTS data, 100 bootstrapped training datasets are drawn. Performance of the final models is assessed on validation data (left out samples) by employing as measures the Brier score and the Area Under the Curve (AUC) with CRs. Miscalibration (absolute accuracy error) is also estimated. Results show that the ML models are able to reach a comparable performance versus the SM at 2, 5, and 10 years regarding both Brier score and AUC (95% confidence intervals overlapped). However, the SM are frequently better calibrated. CONCLUSIONS Overall, ML techniques are less practical as they require substantial implementation time (data preprocessing, hyperparameter tuning, computational intensity), whereas regression methods can perform well without the additional workload of model training. As such, for non-complex real life survival data, these techniques should only be applied complementary to SM as exploratory tools of model's performance. More attention to model calibration is urgently needed.
Collapse
Affiliation(s)
- Georgios Kantidakis
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands. .,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands. .,Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, 1200, Brussels, Belgium.
| | - Hein Putter
- Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Saskia Litière
- Department of Statistics, European Organisation for Research and Treatment of Cancer (EORTC) Headquarters, Ave E. Mounier 83/11, 1200, Brussels, Belgium
| | - Marta Fiocco
- Mathematical Institute (MI) Leiden University, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Section Medical Statistics, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA, Leiden, The Netherlands.,Trial and Data Center, Princess Máxima Center for pediatric oncology (PMC), Heidelberglaan 25, 3584 CS, Utrecht, the Netherlands
| |
Collapse
|
34
|
Chicco D, Jurman G. Ten simple rules for providing bioinformatics support within a hospital. BioData Min 2023; 16:6. [PMID: 36823520 PMCID: PMC9948383 DOI: 10.1186/s13040-023-00326-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Bioinformatics has become a key aspect of the biomedical research programmes of many hospitals' scientific centres, and the establishment of bioinformatics facilities within hospitals has become a common practice worldwide. Bioinformaticians working in these facilities provide computational biology support to medical doctors and principal investigators who are daily dealing with data of patients to analyze. These bioinformatics analysts, although pivotal, usually do not receive formal training for this job. We therefore propose these ten simple rules to guide these bioinformaticians in their work: ten pieces of advice on how to provide bioinformatics support to medical doctors in hospitals. We believe these simple rules can help bioinformatics facility analysts in producing better scientific results and work in a serene and fruitful environment.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, 155 College Street, M5T 3M7, Toronto, Ontario, Canada.
| | - Giuseppe Jurman
- grid.11469.3b0000 0000 9780 0901Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| |
Collapse
|
35
|
Barth J, Lohse KR, Bland MD, Lang CE. Predicting later categories of upper limb activity from earlier clinical assessments following stroke: an exploratory analysis. J Neuroeng Rehabil 2023; 20:24. [PMID: 36810072 PMCID: PMC9945671 DOI: 10.1186/s12984-023-01148-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 02/14/2023] [Indexed: 02/23/2023] Open
Abstract
BACKGROUND Accelerometers allow for direct measurement of upper limb (UL) activity. Recently, multi-dimensional categories of UL performance have been formed to provide a more complete measure of UL use in daily life. Prediction of motor outcomes after stroke have tremendous clinical utility and a next step is to explore what factors might predict someone's subsequent UL performance category. PURPOSE To explore how different machine learning techniques can be used to understand how clinical measures and participant demographics captured early after stroke are associated with the subsequent UL performance categories. METHODS This study analyzed data from two time points from a previous cohort (n = 54). Data used was participant characteristics and clinical measures from early after stroke and a previously established category of UL performance at a later post stroke time point. Different machine learning techniques (a single decision tree, bagged trees, and random forests) were used to build predictive models with different input variables. Model performance was quantified with the explanatory power (in-sample accuracy), predictive power (out-of-bag estimate of error), and variable importance. RESULTS A total of seven models were built, including one single decision tree, three bagged trees, and three random forests. Measures of UL impairment and capacity were the most important predictors of the subsequent UL performance category, regardless of the machine learning algorithm used. Other non-motor clinical measures emerged as key predictors, while participant demographics predictors (with the exception of age) were generally less important across the models. Models built with the bagging algorithms outperformed the single decision tree for in-sample accuracy (26-30% better classification) but had only modest cross-validation accuracy (48-55% out of bag classification). CONCLUSIONS UL clinical measures were the most important predictors of the subsequent UL performance category in this exploratory analysis regardless of the machine learning algorithm used. Interestingly, cognitive and affective measures emerged as important predictors when the number of input variables was expanded. These results reinforce that UL performance, in vivo, is not a simple product of body functions nor the capacity for movement, instead being a complex phenomenon dependent on many physiological and psychological factors. Utilizing machine learning, this exploratory analysis is a productive step toward the prediction of UL performance. Trial registration NA.
Collapse
Affiliation(s)
- Jessica Barth
- Program in Physical Therapy, Washington University School of Medicine, St. Louis, MO, USA
| | - Keith R Lohse
- Program in Physical Therapy, Washington University School of Medicine, St. Louis, MO, USA
- Department of Neurology, Washington University School of Medicine, St. Louis, MO, USA
| | - Marghuretta D Bland
- Program in Physical Therapy, Washington University School of Medicine, St. Louis, MO, USA
- Program in Occupational Therapy, Washington University School of Medicine, St. Louis, MO, USA
- Department of Neurology, Washington University School of Medicine, St. Louis, MO, USA
| | - Catherine E Lang
- Program in Physical Therapy, Washington University School of Medicine, St. Louis, MO, USA.
- Program in Occupational Therapy, Washington University School of Medicine, St. Louis, MO, USA.
- Department of Neurology, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
36
|
Chicco D, Jurman G. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min 2023; 16:4. [PMID: 36800973 PMCID: PMC9938573 DOI: 10.1186/s13040-023-00322-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 02/01/2023] [Indexed: 02/19/2023] Open
Abstract
Binary classification is a common task for which machine learning and computational statistics are used, and the area under the receiver operating characteristic curve (ROC AUC) has become the common standard metric to evaluate binary classifications in most scientific fields. The ROC curve has true positive rate (also called sensitivity or recall) on the y axis and false positive rate on the x axis, and the ROC AUC can range from 0 (worst result) to 1 (perfect result). The ROC AUC, however, has several flaws and drawbacks. This score is generated including predictions that obtained insufficient sensitivity and specificity, and moreover it does not say anything about positive predictive value (also known as precision) nor negative predictive value (NPV) obtained by the classifier, therefore potentially generating inflated overoptimistic results. Since it is common to include ROC AUC alone without precision and negative predictive value, a researcher might erroneously conclude that their classification was successful. Furthermore, a given point in the ROC space does not identify a single confusion matrix nor a group of matrices sharing the same MCC value. Indeed, a given (sensitivity, specificity) pair can cover a broad MCC range, which casts doubts on the reliability of ROC AUC as a performance measure. In contrast, the Matthews correlation coefficient (MCC) generates a high score in its [Formula: see text] interval only if the classifier scored a high value for all the four basic rates of the confusion matrix: sensitivity, specificity, precision, and negative predictive value. A high MCC (for example, MCC [Formula: see text] 0.9), moreover, always corresponds to a high ROC AUC, and not vice versa. In this short study, we explain why the Matthews correlation coefficient should replace the ROC AUC as standard statistic in all the scientific studies involving a binary classification, in all scientific fields.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, 155 College Street, M5T 3M7, Toronto, Ontario, Canada.
| | - Giuseppe Jurman
- grid.11469.3b0000 0000 9780 0901Data Science for Health Unit, Fondazione Bruno Kessler, Via Sommarive 18, 38123 Povo, Trento, Italy
| |
Collapse
|
37
|
Choi S, Hill D, Young J, Cordeiro MF. Image processing and supervised machine learning for retinal microglia characterization in senescence. Methods Cell Biol 2023; 181:109-125. [PMID: 38302234 DOI: 10.1016/bs.mcb.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The process of senescence impairs the function of cells and can ultimately be a key factor in the development of disease. With an aging population, senescence-related diseases are increasing in prevalence. Therefore, understanding the mechanisms of cellular senescence within the central nervous system (CNS), including the retina, may yield new therapeutic pathways to slow or even prevent the development of neuro- and retinal degenerative diseases. One method of probing the changing functions of senescent retinal cells is to observe retinal microglial cells. Their morphological structure may change in response to their surrounding cellular environment. In this chapter, we show how microglial cells in the retina, which are implicated in aging and diseases of the CNS, can be identified, quantified, and classified into five distinct morphotypes using image processing and supervised machine learning algorithms. The process involves dissecting, staining, and mounting mouse retinas, before image capture via fluorescence microscopy. The resulting images can then be classified by morphotype using a support vector machine (SVM) we have recently described showing high accuracy. This SVM model uses shape metrics found to correspond with qualitative descriptions of the shape of each morphotype taken from existing literature. We encourage more objective and widespread use of methods of quantification such as this. We believe automatic delineation of the population of microglial cells in the retina, could potentially lead to their use as retinal imaging biomarkers for disease prediction in the future.
Collapse
Affiliation(s)
- Soyoung Choi
- UCL Institute of Ophthalmology, London, United Kingdom; Novai Ltd, Reading, United Kingdom
| | - Daniel Hill
- UCL Institute of Ophthalmology, London, United Kingdom
| | | | - Maria Francesca Cordeiro
- UCL Institute of Ophthalmology, London, United Kingdom; Novai Ltd, Reading, United Kingdom; Imperial College Ophthalmology Research Group, Imperial College London, London, United Kingdom.
| |
Collapse
|
38
|
Lim PK, Julca I, Mutwil M. Redesigning plant specialized metabolism with supervised machine learning using publicly available reactome data. Comput Struct Biotechnol J 2023; 21:1639-1650. [PMID: 36874159 PMCID: PMC9976193 DOI: 10.1016/j.csbj.2023.01.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023] Open
Abstract
The immense structural diversity of products and intermediates of plant specialized metabolism (specialized metabolites) makes them rich sources of therapeutic medicine, nutrients, and other useful materials. With the rapid accumulation of reactome data that can be accessible on biological and chemical databases, along with recent advances in machine learning, this review sets out to outline how supervised machine learning can be used to design new compounds and pathways by exploiting the wealth of said data. We will first examine the various sources from which reactome data can be obtained, followed by explaining the different machine learning encoding methods for reactome data. We then discuss current supervised machine learning developments that can be employed in various aspects to help redesign plant specialized metabolism.
Collapse
Affiliation(s)
- Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Irene Julca
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
39
|
Afshin-Pour B, Qiu M, Hosseini Vajargah S, Cheyne H, Ha K, Stewart M, Horsky J, Aviv R, Zhang N, Narasimhan M, Chelico J, Musso G, Hajizadeh N. Discriminating Acute Respiratory Distress Syndrome from other forms of respiratory failure via iterative machine learning. Intell Based Med 2023; 7:100087. [PMID: 36624822 DOI: 10.1016/j.ibmed.2023.100087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/22/2022] [Accepted: 01/04/2023] [Indexed: 01/06/2023]
Abstract
Acute Respiratory Distress Syndrome (ARDS) is associated with high morbidity and mortality. Identification of ARDS enables lung protective strategies, quality improvement interventions, and clinical trial enrolment, but remains challenging particularly in the first 24 hours of mechanical ventilation. To address this we built an algorithm capable of discriminating ARDS from other similarly presenting disorders immediately following mechanical ventilation. Specifically, a clinical team examined medical records from 1263 ICU-admitted, mechanically ventilated patients, retrospectively assigning each patient a diagnosis of "ARDS" or "non-ARDS" (e.g., pulmonary edema). Exploiting data readily available in the clinical setting, including patient demographics, laboratory test results from before the initiation of mechanical ventilation, and features extracted by natural language processing of radiology reports, we applied an iterative pre-processing and machine learning framework. The resulting model successfully discriminated ARDS from non-ARDS causes of respiratory failure (AUC = 0.85) among patients meeting Berlin criteria for severe hypoxia. This analysis also highlighted novel patient variables that were informative for identifying ARDS in ICU settings.
Collapse
|
40
|
Lyu M, Xin L, Jin H, Chitkushev LT, Zhang G, Keskin DB, Brusic V. Protocol for Classification Single-Cell PBMC Types from Pathological Samples Using Supervised Machine Learning. Methods Mol Biol 2023; 2673:53-67. [PMID: 37258906 DOI: 10.1007/978-1-0716-3239-0_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Peripheral blood mononuclear cells (PBMC) are mixed subpopulations of blood cells composed of five cell types. PBMC are widely used in the study of the immune system, infectious diseases, cancer, and vaccine development. Single-cell transcriptomics (SCT) allows the labeling of cell types by gene expression patterns from biological samples. Classifying cells into cell types and states is essential for single-cell analyses, especially in the classification of diseases and the assessment of therapeutic interventions, and for many secondary analyses. Most of the classification of cell types from SCT data use unsupervised clustering or a combination of unsupervised and supervised methods including manual correction. In this chapter, we describe a protocol that uses supervised machine learning (ML) methods with SCT data for the classification of PBMC cell types in samples representing pathological states. This protocol has three parts: (1) data preprocessing, (2) labeling of reference PBMC SCT datasets and training supervised ML models, and (3) labeling new PBMC datasets from disease samples. This protocol enables building classification models that are of high accuracy and efficiency. Our example focuses on 10× Genomics technology but applies to datasets from other SCT platforms.
Collapse
Affiliation(s)
- Minjie Lyu
- School of Computer Science, University of Nottingham, Ningbo, Zhejiang, China
| | - Lin Xin
- School of Computer Science, University of Nottingham, Ningbo, Zhejiang, China
| | - Huan Jin
- School of Computer Science, University of Nottingham, Ningbo, Zhejiang, China
| | - Lou T Chitkushev
- Department of Computer Science, Metropolitan College, Boston University, Boston, MA, USA
| | - Guanglan Zhang
- Department of Computer Science, Metropolitan College, Boston University, Boston, MA, USA
| | - Derin B Keskin
- Translational Immuno-Genomics Lab, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vladimir Brusic
- School of Computer Science, University of Nottingham, Ningbo, Zhejiang, China.
| |
Collapse
|
41
|
Lindholz M, Schellenberg CM, Grunow JJ, Kagerbauer S, Milnik A, Zickler D, Angermair S, Reißhauer A, Witzenrath M, Menk M, Boie S, Balzer F, Schaller SJ. Mobilisation of critically ill patients receiving norepinephrine: a retrospective cohort study. Crit Care 2022; 26:362. [PMID: 36434724 PMCID: PMC9700948 DOI: 10.1186/s13054-022-04245-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 11/15/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Mobilisation and exercise intervention in general are safe and feasible in critically ill patients. For patients requiring catecholamines, however, doses of norepinephrine safe for mobilisation in the intensive care unit (ICU) are not defined. This study aimed to describe mobilisation practice in our hospital and identify doses of norepinephrine that allowed a safe mobilisation. METHODS We conducted a retrospective single-centre cohort study of 16 ICUs at a university hospital in Germany with patients admitted between March 2018 and November 2021. Data were collected from our patient data management system. We analysed the effect of norepinephrine on level (ICU Mobility Scale) and frequency (units per day) of mobilisation, early mobilisation (within 72 h of ICU admission), mortality, and rate of adverse events. Data were extracted from free-text mobilisation entries using supervised machine learning (support vector machine). Statistical analyses were done using (generalised) linear (mixed-effect) models, as well as chi-square tests and ANOVAs. RESULTS A total of 12,462 patients were analysed in this study. They received a total of 59,415 mobilisation units. Of these patients, 842 (6.8%) received mobilisation under continuous norepinephrine administration. Norepinephrine administration was negatively associated with the frequency of mobilisation (adjusted difference -0.07 mobilisations per day; 95% CI - 0.09, - 0.05; p ≤ 0.001) and early mobilisation (adjusted OR 0.83; 95% CI 0.76, 0.90; p ≤ 0.001), while a higher norepinephrine dose corresponded to a lower chance to be mobilised out-of-bed (adjusted OR 0.01; 95% CI 0.00, 0.04; p ≤ 0.001). Mobilisation with norepinephrine did not significantly affect mortality (p > 0.1). Higher compared to lower doses of norepinephrine did not lead to a significant increase in adverse events in our practice (p > 0.1). We identified that mobilisation was safe with up to 0.20 µg/kg/min norepinephrine for out-of-bed (IMS ≥ 2) and 0.33 µg/kg/min for in-bed (IMS 0-1) mobilisation. CONCLUSIONS Mobilisation with norepinephrine can be done safely when considering the status of the patient and safety guidelines. We demonstrated that safe mobilisation was possible with norepinephrine doses up to 0.20 µg/kg/min for out-of-bed (IMS ≥ 2) and 0.33 µg/kg/min for in-bed (IMS 0-1) mobilisation.
Collapse
Affiliation(s)
- Maximilian Lindholz
- grid.6363.00000 0001 2218 4662Department of Anesthesiology and Operative Intensive Care Medicine (CVK, CCM), Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Clara M. Schellenberg
- grid.6363.00000 0001 2218 4662Department of Anesthesiology and Operative Intensive Care Medicine (CVK, CCM), Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Julius J. Grunow
- grid.6363.00000 0001 2218 4662Department of Anesthesiology and Operative Intensive Care Medicine (CVK, CCM), Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Simone Kagerbauer
- grid.6936.a0000000123222966Department of Anesthesiology and Intensive Care, School of Medicine, Technical University of Munich, Munich, Germany ,grid.6582.90000 0004 1936 9748Department of Anesthesiology and Intensive Care Medicine, Ulm University, Ulm, Germany
| | - Annette Milnik
- grid.6612.30000 0004 1937 0642Division of Molecular Neuroscience, University of Basel, Basel, Switzerland
| | - Daniel Zickler
- grid.6363.00000 0001 2218 4662Department of Nephrology and Medical Intensive Care, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Stefan Angermair
- grid.6363.00000 0001 2218 4662Department of Anesthesiology and Operative Intensive Care Medicine (CBF), Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Anett Reißhauer
- grid.6363.00000 0001 2218 4662Department of Physical Medicine, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Martin Witzenrath
- grid.6363.00000 0001 2218 4662Department of Infectious Diseases and Pulmonary Medicine, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Mario Menk
- grid.6363.00000 0001 2218 4662Department of Anesthesiology and Operative Intensive Care Medicine (CVK, CCM), Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany ,grid.6363.00000 0001 2218 4662Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Sebastian Boie
- grid.6363.00000 0001 2218 4662Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Felix Balzer
- grid.6363.00000 0001 2218 4662Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany
| | - Stefan J. Schaller
- grid.6363.00000 0001 2218 4662Department of Anesthesiology and Operative Intensive Care Medicine (CVK, CCM), Charité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität Zu Berlin, Berlin, Germany ,grid.6936.a0000000123222966Department of Anesthesiology and Intensive Care, School of Medicine, Technical University of Munich, Munich, Germany
| |
Collapse
|
42
|
De Backer P, Eckhoff JA, Simoens J, Müller DT, Allaeys C, Creemers H, Hallemeesch A, Mestdagh K, Van Praet C, Debbaut C, Decaestecker K, Bruns CJ, Meireles O, Mottrie A, Fuchs HF. Multicentric exploration of tool annotation in robotic surgery: lessons learned when starting a surgical artificial intelligence project. Surg Endosc 2022; 36:8533-8548. [PMID: 35941310 DOI: 10.1007/s00464-022-09487-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 07/16/2022] [Indexed: 01/06/2023]
Abstract
BACKGROUND Artificial intelligence (AI) holds tremendous potential to reduce surgical risks and improve surgical assessment. Machine learning, a subfield of AI, can be used to analyze surgical video and imaging data. Manual annotations provide veracity about the desired target features. Yet, methodological annotation explorations are limited to date. Here, we provide an exploratory analysis of the requirements and methods of instrument annotation in a multi-institutional team from two specialized AI centers and compile our lessons learned. METHODS We developed a bottom-up approach for team annotation of robotic instruments in robot-assisted partial nephrectomy (RAPN), which was subsequently validated in robot-assisted minimally invasive esophagectomy (RAMIE). Furthermore, instrument annotation methods were evaluated for their use in Machine Learning algorithms. Overall, we evaluated the efficiency and transferability of the proposed team approach and quantified performance metrics (e.g., time per frame required for each annotation modality) between RAPN and RAMIE. RESULTS We found a 0.05 Hz image sampling frequency to be adequate for instrument annotation. The bottom-up approach in annotation training and management resulted in accurate annotations and demonstrated efficiency in annotating large datasets. The proposed annotation methodology was transferrable between both RAPN and RAMIE. The average annotation time for RAPN pixel annotation ranged from 4.49 to 12.6 min per image; for vector annotation, we denote 2.92 min per image. Similar annotation times were found for RAMIE. Lastly, we elaborate on common pitfalls encountered throughout the annotation process. CONCLUSIONS We propose a successful bottom-up approach for annotator team composition, applicable to any surgical annotation project. Our results set the foundation to start AI projects for instrument detection, segmentation, and pose estimation. Due to the immense annotation burden resulting from spatial instrumental annotation, further analysis into sampling frequency and annotation detail needs to be conducted.
Collapse
Affiliation(s)
- Pieter De Backer
- ORSI Academy, Proefhoevestraat 12, 9090, Melle, Belgium.
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
- IBiTech-Biommeda, Faculty of Engineering and Architecture, and CRIG, Ghent University, Ghent, Belgium.
- Department of Urology, Ghent University Hospital, Ghent, Belgium.
| | - Jennifer A Eckhoff
- Robotic Innovation Laboratory, Department of General, Visceral, Tumor and Transplantsurgery, University Hospital Cologne, Cologne, Germany
| | - Jente Simoens
- ORSI Academy, Proefhoevestraat 12, 9090, Melle, Belgium
| | - Dolores T Müller
- Robotic Innovation Laboratory, Department of General, Visceral, Tumor and Transplantsurgery, University Hospital Cologne, Cologne, Germany
| | - Charlotte Allaeys
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Heleen Creemers
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Amélie Hallemeesch
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Kenzo Mestdagh
- Department of Human Structure and Repair, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | | | - Charlotte Debbaut
- IBiTech-Biommeda, Faculty of Engineering and Architecture, and CRIG, Ghent University, Ghent, Belgium
| | | | - Christiane J Bruns
- Robotic Innovation Laboratory, Department of General, Visceral, Tumor and Transplantsurgery, University Hospital Cologne, Cologne, Germany
| | - Ozanan Meireles
- Surgical Artificial Intelligence and Innovation Laboratory, Massachusetts General Hospital, Boston, USA
| | - Alexandre Mottrie
- ORSI Academy, Proefhoevestraat 12, 9090, Melle, Belgium
- Department of Urology, OLV Hospital Aalst-Asse-Ninove, Aalst, Belgium
| | - Hans F Fuchs
- Robotic Innovation Laboratory, Department of General, Visceral, Tumor and Transplantsurgery, University Hospital Cologne, Cologne, Germany
| |
Collapse
|
43
|
El Jai M, Zhar M, Ouazar D, Akhrif I, Saidou N. Socio-economic analysis of short-term trends of COVID-19: modeling and data analytics. BMC Public Health 2022; 22:1633. [PMID: 36038843 PMCID: PMC9421639 DOI: 10.1186/s12889-022-13788-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 07/12/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND COVID-19 caused a worldwide outbreak leading the majority of human activities to a rough breakdown. Many stakeholders proposed multiple interventions to slow down the disease and number of papers were devoted to the understanding the pandemic, but to a less extend some were oriented socio-economic analysis. In this paper, a socio-economic analysis is proposed to investigate the early-age effect of socio-economic factors on COVID-19 spread. METHODS Fifty-two countries were selected for this study. A cascade algorithm was developed to extract the R0 number and the day J*; these latter should decrease as the pandemic flattens. Subsequently, R0 and J* were modeled according to socio-economic factors using multilinear stepwise-regression. RESULTS The findings demonstrated that low values of days before lockdown should flatten the pandemic by reducing J*. Hopefully, DBLD is only parameter to be tuned in the short-term; the other socio-economic parameters cannot easily be handled as they are annually updated. Furthermore, it was highlighted that the elderly is also a major influencing factor especially because it is involved in the interactions terms in R0 model. Simulations proved that the health care system could improve the pandemic damping for low elderly. In contrast, above a given elderly, the reproduction number R0 cannot be reduced even for developed countries (showing high HCI values), meaning that the disease's severity cannot be smoothed regardless the performance of the corresponding health care system; non-pharmaceutical interventions are then expected to be more efficient than corrective measures. DISCUSSION The relationship between the socio-economic factors and the pandemic parameters R0 and J* exhibits complex relations compared to the models that are proposed in the literature. The quadratic regression model proposed here has discriminated the most influencing parameters within the following approximated order, DLBL, HCI, Elderly, Tav, CO2, and WC as first order, interaction, and second order terms. CONCLUSIONS This modeling allowed the emergence of interaction terms that don't appear in similar studies; this led to emphasize more complex relationship between the infection spread and the socio-economic factors. Future works will focus on enriching the datasets and the optimization of the controlled parameters to short-term slowdown of similar pandemics.
Collapse
Affiliation(s)
- Mostapha El Jai
- Euromed Center of Research, Euromed Polytechnic School, Euromed University of Fes, Fes, Morocco. .,Ecole Nationale Supérieure d'Arts & Métiers, Moulay Ismail University, Meknes, Morocco.
| | - Mehdi Zhar
- Euromed Center of Research, Euromed Polytechnic School, Euromed University of Fes, Fes, Morocco.,IMS Team, SIME Lab, ENSIAS, Mohammed V University, Rabat, Morocco
| | - Driss Ouazar
- Mohamadia School of Engineers, Mohamed V University, Rabat, Morocco
| | - Iatimad Akhrif
- Euromed Center of Research, Euromed Polytechnic School, Euromed University of Fes, Fes, Morocco
| | - Nourddin Saidou
- Euromed Center of Research, INSA-Euromed, Euromed University of Fes, Fes, Morocco
| |
Collapse
|
44
|
Marathe G, Moodie EEM, Brouillette MJ, Cox J, Cooper C, Delaunay CL, Conway B, Hull M, Martel-Laferrière V, Vachon ML, Walmsley S, Wong A, Klein MB; Canadian Co-Infection Cohort. Predicting the presence of depressive symptoms in the HIV-HCV co-infected population in Canada using supervised machine learning. BMC Med Res Methodol 2022; 22:223. [PMID: 35962372 DOI: 10.1186/s12874-022-01700-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 07/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Depression is common in the human immunodeficiency virus (HIV)-hepatitis C virus (HCV) co-infected population. Demographic, behavioural, and clinical data collected in research settings may be of help in identifying those at risk for clinical depression. We aimed to predict the presence of depressive symptoms indicative of a risk of depression and identify important classification predictors using supervised machine learning. METHODS We used data from the Canadian Co-infection Cohort, a multicentre prospective cohort, and its associated sub-study on Food Security (FS). The Center for Epidemiologic Studies Depression Scale-10 (CES-D-10) was administered in the FS sub-study; participants were classified as being at risk for clinical depression if scores ≥ 10. We developed two random forest algorithms using the training data (80%) and tenfold cross validation to predict the CES-D-10 classes-1. Full algorithm with all candidate predictors (137 predictors) and 2. Reduced algorithm using a subset of predictors based on expert opinion (46 predictors). We evaluated the algorithm performances in the testing data using area under the receiver operating characteristic curves (AUC) and generated predictor importance plots. RESULTS We included 1,934 FS sub-study visits from 717 participants who were predominantly male (73%), white (76%), unemployed (73%), and high school educated (52%). At the first visit, median age was 49 years (IQR:43-54) and 53% reported presence of depressive symptoms with CES-D-10 scores ≥ 10. The full algorithm had an AUC of 0.82 (95% CI:0.78-0.86) and the reduced algorithm of 0.76 (95% CI:0.71-0.81). Employment, HIV clinical stage, revenue source, body mass index, and education were the five most important predictors. CONCLUSION We developed a prediction algorithm that could be instrumental in identifying individuals at risk for depression in the HIV-HCV co-infected population in research settings. Development of such machine learning algorithms using research data with rich predictor information can be useful for retrospective analyses of unanswered questions regarding impact of depressive symptoms on clinical and patient-centred outcomes among vulnerable populations.
Collapse
|
45
|
Overton C, Casazza M, Bretz J, McDuie F, Matchett E, Mackell D, Lorenz A, Mott A, Herzog M, Ackerman J. Machine learned daily life history classification using low frequency tracking data and automated modelling pipelines: application to North American waterfowl. Mov Ecol 2022; 10:23. [PMID: 35578372 PMCID: PMC9109391 DOI: 10.1186/s40462-022-00324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 05/03/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Identifying animal behaviors, life history states, and movement patterns is a prerequisite for many animal behavior analyses and effective management of wildlife and habitats. Most approaches classify short-term movement patterns with high frequency location or accelerometry data. However, patterns reflecting life history across longer time scales can have greater relevance to species biology or management needs, especially when available in near real-time. Given limitations in collecting and using such data to accurately classify complex behaviors in the long-term, we used hourly GPS data from 5 waterfowl species to produce daily activity classifications with machine-learned models using "automated modelling pipelines". METHODS Automated pipelines are computer-generated code that complete many tasks including feature engineering, multi-framework model development, training, validation, and hyperparameter tuning to produce daily classifications from eight activity patterns reflecting waterfowl life history or movement states. We developed several input features for modeling grouped into three broad categories, hereafter "feature sets": GPS locations, habitat information, and movement history. Each feature set used different data sources or data collected across different time intervals to develop the "features" (independent variables) used in models. RESULTS Automated modelling pipelines rapidly developed easily reproducible data preprocessing and analysis steps, identification and optimization of the best performing model and provided outputs for interpreting feature importance. Unequal expression of life history states caused unbalanced classes, so we evaluated feature set importance using a weighted F1-score to balance model recall and precision among individual classes. Although the best model using the least restrictive feature set (only 24 hourly relocations in a day) produced effective classifications (weighted F1 = 0.887), models using all feature sets performed substantially better (weighted F1 = 0.95), particularly for rarer but demographically more impactful life history states (i.e., nesting). CONCLUSIONS Automated pipelines generated models producing highly accurate classifications of complex daily activity patterns using relatively low frequency GPS and incorporating more classes than previous GPS studies. Near real-time classification is possible which is ideal for time-sensitive needs such as identifying reproduction. Including habitat and longer sequences of spatial information produced more accurate classifications but incurred slight delays in processing.
Collapse
Affiliation(s)
- Cory Overton
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA.
| | - Michael Casazza
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| | - Joseph Bretz
- Cloud Hosting Solutions, U.S. Geological Survey, Bozeman, MT, USA
| | - Fiona McDuie
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
- Moss Landing Laboratories, San Jose State University Research Foundation, San Jose, CA, USA
| | - Elliott Matchett
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| | - Desmond Mackell
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| | - Austen Lorenz
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| | - Andrea Mott
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| | - Mark Herzog
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| | - Josh Ackerman
- Western Ecological Research Center, U.S. Geological Survey, Dixon Field Station, Dixon, CA, USA
| |
Collapse
|
46
|
Naeem MZ, Rustam F, Mehmood A, Ashraf I, Choi GS. Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms. PeerJ Comput Sci 2022; 8:e914. [PMID: 35494818 PMCID: PMC9044332 DOI: 10.7717/peerj-cs.914] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 02/12/2022] [Indexed: 06/12/2023]
Abstract
The Internet Movie Database (IMDb), being one of the popular online databases for movies and personalities, provides a wide range of movie reviews from millions of users. This provides a diverse and large dataset to analyze users' sentiments about various personalities and movies. Despite being helpful to provide the critique of movies, the reviews on IMDb cannot be read as a whole and requires automated tools to provide insights on the sentiments in such reviews. This study provides the implementation of various machine learning models to measure the polarity of the sentiments presented in user reviews on the IMDb website. For this purpose, the reviews are first preprocessed to remove redundant information and noise, and then various classification models like support vector machines (SVM), Naïve Bayes classifier, random forest, and gradient boosting classifiers are used to predict the sentiment of these reviews. The objective is to find the optimal process and approach to attain the highest accuracy with the best generalization. Various feature engineering approaches such as term frequency-inverse document frequency (TF-IDF), bag of words, global vectors for word representations, and Word2Vec are applied along with the hyperparameter tuning of the classification models to enhance the classification accuracy. Experimental results indicate that the SVM obtains the highest accuracy when used with TF-IDF features and achieves an accuracy of 89.55%. The sentiment classification accuracy of the models is affected due to the contradictions in the user sentiments in the reviews and assigned labels. For tackling this issue, TextBlob is used to assign a sentiment to the dataset containing reviews before it can be used for training. Experimental results on TextBlob assigned sentiments indicate that an accuracy of 92% can be obtained using the proposed model.
Collapse
Affiliation(s)
- Muhammad Zaid Naeem
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Furqan Rustam
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Arif Mehmood
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongsan si, Daegu, South Korea
| |
Collapse
|
47
|
Derkarabetian S, Starrett J, Hedin M. Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data. Front Zool 2022; 19:8. [PMID: 35193622 PMCID: PMC8862334 DOI: 10.1186/s12983-022-00453-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 01/27/2022] [Indexed: 12/28/2022] Open
Abstract
The diversity of biological and ecological characteristics of organisms, and the underlying genetic patterns and processes of speciation, makes the development of universally applicable genetic species delimitation methods challenging. Many approaches, like those incorporating the multispecies coalescent, sometimes delimit populations and overestimate species numbers. This issue is exacerbated in taxa with inherently high population structure due to low dispersal ability, and in cryptic species resulting from nonecological speciation. These taxa present a conundrum when delimiting species: analyses rely heavily, if not entirely, on genetic data which over split species, while other lines of evidence lump. We showcase this conundrum in the harvester Theromaster brunneus, a low dispersal taxon with a wide geographic distribution and high potential for cryptic species. Integrating morphology, mitochondrial, and sub-genomic (double-digest RADSeq and ultraconserved elements) data, we find high discordance across analyses and data types in the number of inferred species, with further evidence that multispecies coalescent approaches over split. We demonstrate the power of a supervised machine learning approach in effectively delimiting cryptic species by creating a "custom" training data set derived from a well-studied lineage with similar biological characteristics as Theromaster. This novel approach uses known taxa with particular biological characteristics to inform unknown taxa with similar characteristics, using modern computational tools ideally suited for species delimitation. The approach also considers the natural history of organisms to make more biologically informed species delimitation decisions, and in principle is broadly applicable for taxa across the tree of life.
Collapse
Affiliation(s)
- Shahan Derkarabetian
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, 26 Oxford St., Cambridge, MA, 02138, USA.
| | - James Starrett
- Department of Entomology and Nematology, University of California, Davis, Briggs Hall, Davis, CA, 95616-5270, USA
| | - Marshal Hedin
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA, 92182-4614, USA
| |
Collapse
|
48
|
Sun Y, Hu J, Yusuf A, Wang Y, Jin H, Zhang X, Liu Y, Wang Y, Yang G, He J. A critical review on microbial degradation of petroleum-based plastics: quantitatively effects of chemical addition in cultivation media on biodegradation efficiency. Biodegradation 2022; 33:1-16. [PMID: 35025000 DOI: 10.1007/s10532-021-09969-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 12/12/2021] [Indexed: 01/19/2023]
Abstract
Petroleum-based plastics (PBP) with different properties have been developed to suit various needs of modern lives. Nevertheless, these well-developed properties also present the double-edged sword effect that significantly threatens the sustainability of the environment. This work focuses on the impact of microbial cultivating conditions (the elementary compositions and temperature) to provide insightful information for the process optimization of microbial degradation. The major elementary compositions in cultivation media and temperature from the literature were radically reviewed and assessed using the constructed supervised machine learning algorithm. Fifty-two literatures were collected as a training dataset to investigate the impact of major chemical elements and cultivation temperature upon PBP biodegradation. Among six singular parameters (NH4+, K+, PO43-, Mg2+, Ca2+, and temperature) and thirty corresponding binary parameters, four singular (NH4+, K+, PO43-, and Mg2+) and six binary parameters (NH4+/K+, NH4+/PO43-, NH4+/Ca2+, K+/PO43-, PO43-/Mg2+, Mg2+/Temp) were identified as statistically significant towards microbial degradation through analysis of variance (ANOVA). The binary effect (PO43-/Mg2+) is found to be the most statistically significant towards the microbial degradation of PBP. The concentration range, which locates at 0.1-0.6 g/L for Mg2+ and 0-2.8 g/L for PO43-, was identified to contribute to the maximum PBP biodegradation. Among all the investigated elements, Mg2+ is the only element that is statistically and significantly associated with the variations of cultivation temperature. The optimal preparation conditions within ± 20% uncertainties based upon the range of collected literature reports are recommended. Five representative cultivation elementary compositions (NH4+, K+, PO43-, Mg2+, and Ca2+) and temperature were reviewed from fifty two different literature reports to investigate their impacts on the microbial degradation of PBP using supervised machine learning algorithm. The optimal cultivation conditions based upon collected literature reports to achieve biodegradation over 80% were identified.
Collapse
Affiliation(s)
- Yong Sun
- Key Laboratory of Carbonaceous Wastes Processing and Process Intensification of Zhejiang Province, University of Nottingham Ningbo, Ningbo, 315100, China. .,School of Engineering, Edith Cowan University, 270 Joondalup Drive, Joondalup, WA, 6027, Australia.
| | - Jing Hu
- Key Laboratory of Carbonaceous Wastes Processing and Process Intensification of Zhejiang Province, University of Nottingham Ningbo, Ningbo, 315100, China
| | - Abubakar Yusuf
- Key Laboratory of Carbonaceous Wastes Processing and Process Intensification of Zhejiang Province, University of Nottingham Ningbo, Ningbo, 315100, China
| | - Yixiao Wang
- Key Laboratory of Carbonaceous Wastes Processing and Process Intensification of Zhejiang Province, University of Nottingham Ningbo, Ningbo, 315100, China
| | - Huan Jin
- School of Computer Science, University of Nottingham Ningbo, Ningbo, 15100, China.
| | - Xiyue Zhang
- Key Laboratory of Carbonaceous Wastes Processing and Process Intensification of Zhejiang Province, University of Nottingham Ningbo, Ningbo, 315100, China
| | - Yiyang Liu
- Department of Chemistry, University College London (UCL), 20 Gordon Street, London, WC1H 0AJ, UK
| | - Yunshan Wang
- National Engineering Laboratory of Cleaner Hydrometallurgical Production Technology, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China
| | - Gang Yang
- National Engineering Laboratory of Cleaner Hydrometallurgical Production Technology, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jun He
- Department of Chemical and Environmental Engineering, University of Nottingham Ningbo, Ningbo, 315100, China. .,Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, Ningbo, 315021, China.
| |
Collapse
|
49
|
Lee ES, Durant TJ. Supervised machine learning in the mass spectrometry laboratory: A tutorial. J Mass Spectrom Adv Clin Lab 2022; 23:1-6. [PMID: 34984411 PMCID: PMC8692990 DOI: 10.1016/j.jmsacl.2021.12.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 12/02/2021] [Accepted: 12/06/2021] [Indexed: 11/19/2022] Open
Abstract
As the demand for laboratory testing by mass spectrometry increases, so does the need for automated methods for data analysis. Clinical mass spectrometry (MS) data is particularly well-suited for machine learning (ML) methods, which deal nicely with structured and discrete data elements. The alignment of these two fields offers a promising synergy that can be used to optimize workflows, improve result quality, and enhance our understanding of high-dimensional datasets and their inherent relationship with disease. In recent years, there has been an increasing number of publications that examine the capabilities of ML-based software in the context of chromatography and MS. However, given the historically distant nature between the fields of clinical chemistry and computer science, there is an opportunity to improve technological literacy of ML-based software within the clinical laboratory scientist community. To this end, we present a basic overview of ML and a tutorial of an ML-based experiment using a previously published MS dataset. The purpose of this paper is to describe the fundamental principles of supervised ML, outline the steps that are classically involved in an ML-based experiment, and discuss the purpose of good ML practice in the context of a binary MS classification problem.
Collapse
Key Words
- Amino acid
- Artificial intelligence
- CART, Classification and Regression Trees
- ML, Machine Learning
- MS, Mass Spectrometry
- Mass spectrometry
- NLL, Negative Log Loss
- PAA, Plasma Amino Acid
- PR, Precision-Recall
- PRAUC, Area Under the Precision-Recall Curve
- RL, Reinforcement Learning
- ROC, Receiver Operator Curve
- SCF, Supplemental Code File
- Supervised machine learning
- XGBT, Extreme Gradient Boosted Trees
- Xgboost
Collapse
Affiliation(s)
- Edward S. Lee
- Department of Laboratory Medicine, at Yale School of Medicine, New Haven, CT, USA
- Department of Laboratory Medicine, at Yale New Haven Hospital, New Haven, CT, USA
| | - Thomas J.S. Durant
- Department of Laboratory Medicine, at Yale School of Medicine, New Haven, CT, USA
- Department of Laboratory Medicine, at Yale New Haven Hospital, New Haven, CT, USA
- Corresponding author at: Department of Laboratory Medicine, 55 Park Street PS345D, New Haven, CT 06511, USA.
| |
Collapse
|
50
|
Azam N, Ahmad T, Ul Haq N. Automatic emotion recognition in healthcare data using supervised machine learning. PeerJ Comput Sci 2021; 7:e751. [PMID: 35036528 PMCID: PMC8725656 DOI: 10.7717/peerj-cs.751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 09/28/2021] [Indexed: 06/14/2023]
Abstract
Human feelings are fundamental to perceive the conduct and state of mind of an individual. A healthy emotional state is one significant highlight to improve personal satisfaction. On the other hand, bad emotional health can prompt social or psychological well-being issues. Recognizing or detecting feelings in online health care data gives important and helpful information regarding the emotional state of patients. To recognize or detection of patient's emotion against a specific disease using text from online sources is a challenging task. In this paper, we propose a method for the automatic detection of patient's emotions in healthcare data using supervised machine learning approaches. For this purpose, we created a new dataset named EmoHD, comprising of 4,202 text samples against eight disease classes and six emotion classes, gathered from different online resources. We used six different supervised machine learning models based on different feature engineering techniques. We also performed a detailed comparison of the chosen six machine learning algorithms using different feature vectors on our dataset. We achieved the highest 87% accuracy using MultiLayer Perceptron as compared to other state of the art models. Moreover, we use the emotional guidance scale to show that there is a link between negative emotion and psychological health issues. Our proposed work will be helpful to automatically detect a patient's emotion during disease and to avoid extreme acts like suicide, mental disorders, or psychological health issues. The implementation details are made publicly available at the given link: https://bit.ly/2NQeGET.
Collapse
Affiliation(s)
- Nazish Azam
- Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Pakistan
| | - Tauqir Ahmad
- Department of Computer Science, University of Engineering and Technology Lahore, Lahore, Pakistan
| | - Nazeef Ul Haq
- School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad, Pakistan
| |
Collapse
|