1
|
Deep learning-based identification of spine growth potential on EOS radiographs. Eur Radiol 2024; 34:2849-2860. [PMID: 37848772 DOI: 10.1007/s00330-023-10308-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/21/2023] [Accepted: 08/15/2023] [Indexed: 10/19/2023]
Abstract
OBJECTIVES To develop an automatic computer-based method that can help clinicians in assessing spine growth potential based on EOS radiographs. METHODS We developed a deep learning-based (DL) algorithm that can mimic the human judgment process to automatically determine spine growth potential and the Risser sign based on full-length spine EOS radiographs. A total of 3383 EOS cases were collected and used for the training and test of the algorithm. Subsequently, the completed DL algorithm underwent clinical validation on an additional 440 cases and was compared to the evaluations of four clinicians. RESULTS Regarding the Risser sign, the weighted kappa value of our DL algorithm was 0.933, while that of the four clinicians ranged from 0.909 to 0.930. In the assessment of spine growth potential, the kappa value of our DL algorithm was 0.944, while the kappa values of the four clinicians were 0.916, 0.934, 0.911, and 0.920, respectively. Furthermore, our DL algorithm obtained a slightly higher accuracy (0.973) and Youden index (0.952) compared to the best values achieved by the four clinicians. In addition, the speed of our DL algorithm was 15.2 ± 0.3 s/40 cases, much faster than the inference speeds of the clinicians, ranging from 177.2 ± 28.0 s/40 cases to 241.2 ± 64.1 s/40 cases. CONCLUSIONS Our algorithm demonstrated comparable or even better performance compared to clinicians in assessing spine growth potential. This stable, efficient, and convenient algorithm seems to be a promising approach to assist doctors in clinical practice and deserves further study. CLINICAL RELEVANCE STATEMENT This method has the ability to quickly ascertain the spine growth potential based on EOS radiographs, and it holds promise to provide assistance to busy doctors in certain clinical scenarios. KEY POINTS • In the clinic, there is no available computer-based method that can automatically assess spine growth potential. • We developed a deep learning-based method that could automatically ascertain spine growth potential. • Compared with the results of the clinicians, our algorithm got comparable results.
Collapse
|
2
|
Impact of Deep Learning Image Reconstruction Methods on MRI Throughput. Radiol Artif Intell 2024; 6:e230181. [PMID: 38506618 DOI: 10.1148/ryai.230181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Purpose To evaluate the effect of implementing two distinct commercially available deep learning reconstruction (DLR) algorithms on the efficiency of MRI examinations conducted in real clinical practice within an outpatient setting at a large, multicenter institution. Materials and Methods This retrospective study included 7346 examinations from 10 clinical MRI scanners analyzed during the pre- and postimplementation periods of DLR methods. Two different types of DLR methods, namely Digital Imaging and Communications in Medicine (DICOM)-based and k-space-based methods, were implemented in half of the scanners (three DICOM-based and two k-space-based), while the remaining five scanners had no DLR method implemented. Scan and room times of each examination type during the pre- and postimplementation periods were compared among the different DLR methods using the Wilcoxon test. Results The application of deep learning methods resulted in significant reductions in scan and room times for certain examination types. The DICOM-based method demonstrated up to a 53% reduction in scan times and a 41% reduction in room times for various study types. The k-space-based method demonstrated up to a 27% reduction in scan times but did not significantly reduce room times. Conclusion DLR methods were associated with reductions in scan and room times in a clinical setting, though the effects were heterogeneous depending on examination type. Thus, potential adopters should carefully evaluate their case mix to determine the impact of integrating these tools. Keywords: Deep Learning MRI Reconstruction, Reconstruction Algorithms, DICOM-based Reconstruction, k-Space-based Reconstruction © RSNA, 2024 See also the commentary by GharehMohammadi in this issue.
Collapse
|
3
|
Olecranon bone age assessment in puberty using a lateral elbow radiograph and a deep-learning model. Eur Radiol 2024:10.1007/s00330-024-10748-x. [PMID: 38676732 DOI: 10.1007/s00330-024-10748-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/28/2024] [Accepted: 03/21/2024] [Indexed: 04/29/2024]
Abstract
OBJECTIVES To improve pubertal bone age (BA) evaluation by developing a precise and practical elbow BA classification using the olecranon, and a deep-learning AI model. MATERIALS AND METHODS Lateral elbow radiographs taken for BA evaluation in children under 18 years were collected from January 2020 to June 2022, retrospectively. A novel classification and the olecranon BA were established based on the morphological changes in the olecranon ossification process during puberty. The olecranon BA was compared with other elbow and hand BA methods, using intraclass correlation coefficients (ICCs), and a deep-learning AI model was developed. RESULTS A total of 3508 lateral elbow radiographs (mean age 9.8 ± 1.8 years) were collected. The olecranon BA showed the highest applicability (100%) and interobserver agreement (ICC 0.993) among elbow BA methods. It showed excellent reliability with Sauvegrain (0.967 in girls, 0.969 in boys) and Dimeglio (0.978 in girls, 0.978 in boys) elbow BA methods, as well as Korean standard (KS) hand BA in boys (0.917), and good reliability with KS in girls (0.896) and Greulich-Pyle (GP)/Tanner-Whitehouse (TW)3 (0.835 in girls, 0.895 in boys) hand BA methods. The AI model for olecranon BA showed an accuracy of 0.96 and a specificity of 0.98 with EfficientDet-b4. External validation showed an accuracy of 0.86 and a specificity of 0.91. CONCLUSION The olecranon BA evaluation for puberty, requiring only a lateral elbow radiograph, showed the highest applicability and interobserver agreement, and excellent reliability with other BA evaluation methods, along with a high performance of the AI model. CLINICAL RELEVANCE STATEMENT This AI model uses a single lateral elbow radiograph to determine bone age for puberty from the olecranon ossification center and can improve pubertal bone age assessment with the highest applicability and excellent reliability compared to previous methods. KEY POINTS Elbow bone age is valuable for pubertal bone age assessment, but conventional methods have limitations. Olecranon bone age and its AI model showed high performances for pubertal bone age assessment. Olecranon bone age system is practical and accurate while requiring only a single lateral elbow radiograph.
Collapse
|
4
|
Bone age assessment based on three-dimensional ultrasound and artificial intelligence compared with paediatrician-read radiographic bone age: protocol for a prospective, diagnostic accuracy study. BMJ Open 2024; 14:e079969. [PMID: 38401893 PMCID: PMC10895244 DOI: 10.1136/bmjopen-2023-079969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 01/31/2024] [Indexed: 02/26/2024] Open
Abstract
INTRODUCTION Radiographic bone age (BA) assessment is widely used to evaluate children's growth disorders and predict their future height. Moreover, children are more sensitive and vulnerable to X-ray radiation exposure than adults. The purpose of this study is to develop a new, safer, radiation-free BA assessment method for children by using three-dimensional ultrasound (3D-US) and artificial intelligence (AI), and to test the diagnostic accuracy and reliability of this method. METHODS AND ANALYSIS This is a prospective, observational study. All participants will be recruited through Paediatric Growth and Development Clinic. All participants will receive left hand 3D-US and X-ray examination at the Shanghai Sixth People's Hospital on the same day, all images will be recorded. These image related data will be collected and randomly divided into training set (80% of all) and test set (20% of all). The training set will be used to establish a cascade network of 3D-US skeletal image segmentation and BA prediction model to achieve end-to-end prediction of image to BA. The test set will be used to evaluate the accuracy of AI BA model of 3D-US. We have developed a new ultrasonic scanning device, which can be proposed to automatic 3D-US scanning of hands. AI algorithms, such as convolutional neural network, will be used to identify and segment the skeletal structures in the hand 3D-US images. We will achieve automatic segmentation of hand skeletal 3D-US images, establish BA prediction model of 3D-US, and test the accuracy of the prediction model. ETHICS AND DISSEMINATION The Ethics Committee of Shanghai Sixth People's Hospital approved this study. The approval number is 2022-019. A written informed consent will be obtained from their parent or guardian of each participant. Final results will be published in peer-reviewed journals and presented at national and international conferences. TRIAL REGISTRATION NUMBER ChiCTR2200057236.
Collapse
|
5
|
Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines. Nat Commun 2024; 15:1619. [PMID: 38388497 PMCID: PMC10883966 DOI: 10.1038/s41467-024-45355-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/22/2024] [Indexed: 02/24/2024] Open
Abstract
The Consolidated Standards of Reporting Trials extension for Artificial Intelligence interventions (CONSORT-AI) was published in September 2020. Since its publication, several randomised controlled trials (RCTs) of AI interventions have been published but their completeness and transparency of reporting is unknown. This systematic review assesses the completeness of reporting of AI RCTs following publication of CONSORT-AI and provides a comprehensive summary of RCTs published in recent years. 65 RCTs were identified, mostly conducted in China (37%) and USA (18%). Median concordance with CONSORT-AI reporting was 90% (IQR 77-94%), although only 10 RCTs explicitly reported its use. Several items were consistently under-reported, including algorithm version, accessibility of the AI intervention or code, and references to a study protocol. Only 3 of 52 included journals explicitly endorsed or mandated CONSORT-AI. Despite a generally high concordance amongst recent AI RCTs, some AI-specific considerations remain systematically poorly reported. Further encouragement of CONSORT-AI adoption by journals and funders may enable more complete adoption of the full CONSORT-AI guidelines.
Collapse
|
6
|
An artificial intelligence-based bone age assessment model for Han and Tibetan children. Front Physiol 2024; 15:1329145. [PMID: 38426209 PMCID: PMC10902452 DOI: 10.3389/fphys.2024.1329145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 02/02/2024] [Indexed: 03/02/2024] Open
Abstract
Background: Manual bone age assessment (BAA) is associated with longer interpretation time and higher cost and variability, thus posing challenges in areas with restricted medical facilities, such as the high-altitude Tibetan Plateau. The application of artificial intelligence (AI) for automating BAA could facilitate resolving this issue. This study aimed to develop an AI-based BAA model for Han and Tibetan children. Methods: A model named "EVG-BANet" was trained using three datasets, including the Radiology Society of North America (RSNA) dataset (training set n = 12611, validation set n = 1425, and test set n = 200), the Radiological Hand Pose Estimation (RHPE) dataset (training set n = 5491, validation set n = 713, and test set n = 79), and a self-established local dataset [training set n = 825 and test set n = 351 (Han n = 216 and Tibetan n = 135)]. An open-access state-of-the-art model BoNet was used for comparison. The accuracy and generalizability of the two models were evaluated using the abovementioned three test sets and an external test set (n = 256, all were Tibetan). Mean absolute difference (MAD) and accuracy within 1 year were used as indicators. Bias was evaluated by comparing the MAD between the demographic groups. Results: EVG-BANet outperformed BoNet in the MAD on the RHPE test set (0.52 vs. 0.63 years, p < 0.001), the local test set (0.47 vs. 0.62 years, p < 0.001), and the external test set (0.53 vs. 0.66 years, p < 0.001) and exhibited a comparable MAD on the RSNA test set (0.34 vs. 0.35 years, p = 0.934). EVG-BANet achieved accuracy within 1 year of 97.7% on the local test set (BoNet 90%, p < 0.001) and 89.5% on the external test set (BoNet 85.5%, p = 0.066). EVG-BANet showed no bias in the local test set but exhibited a bias related to chronological age in the external test set. Conclusion: EVG-BANet can accurately predict the bone age (BA) for both Han children and Tibetan children living in the Tibetan Plateau with limited healthcare facilities.
Collapse
|
7
|
AI in Orthodontics: Revolutionizing Diagnostics and Treatment Planning-A Comprehensive Review. J Clin Med 2024; 13:344. [PMID: 38256478 PMCID: PMC10816993 DOI: 10.3390/jcm13020344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/29/2023] [Accepted: 01/05/2024] [Indexed: 01/24/2024] Open
Abstract
The advent of artificial intelligence (AI) in medicine has transformed various medical specialties, including orthodontics. AI has shown promising results in enhancing the accuracy of diagnoses, treatment planning, and predicting treatment outcomes. Its usage in orthodontic practices worldwide has increased with the availability of various AI applications and tools. This review explores the principles of AI, its applications in orthodontics, and its implementation in clinical practice. A comprehensive literature review was conducted, focusing on AI applications in dental diagnostics, cephalometric evaluation, skeletal age determination, temporomandibular joint (TMJ) evaluation, decision making, and patient telemonitoring. Due to study heterogeneity, no meta-analysis was possible. AI has demonstrated high efficacy in all these areas, but variations in performance and the need for manual supervision suggest caution in clinical settings. The complexity and unpredictability of AI algorithms call for cautious implementation and regular manual validation. Continuous AI learning, proper governance, and addressing privacy and ethical concerns are crucial for successful integration into orthodontic practice.
Collapse
|
8
|
Deeplasia: deep learning for bone age assessment validated on skeletal dysplasias. Pediatr Radiol 2024; 54:82-95. [PMID: 37953411 PMCID: PMC10776485 DOI: 10.1007/s00247-023-05789-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND Skeletal dysplasias collectively affect a large number of patients worldwide. Most of these disorders cause growth anomalies. Hence, evaluating skeletal maturity via the determination of bone age (BA) is a useful tool. Moreover, consecutive BA measurements are crucial for monitoring the growth of patients with such disorders, especially for timing hormonal treatment or orthopedic interventions. However, manual BA assessment is time-consuming and suffers from high intra- and inter-rater variability. This is further exacerbated by genetic disorders causing severe skeletal malformations. While numerous approaches to automate BA assessment have been proposed, few are validated for BA assessment on children with skeletal dysplasias. OBJECTIVE We present Deeplasia, an open-source prior-free deep-learning approach designed for BA assessment specifically validated on patients with skeletal dysplasias. MATERIALS AND METHODS We trained multiple convolutional neural network models under various conditions and selected three to build a precise model ensemble. We utilized the public BA dataset from the Radiological Society of North America (RSNA) consisting of training, validation, and test subsets containing 12,611, 1,425, and 200 hand and wrist radiographs, respectively. For testing the performance of our model ensemble on dysplastic hands, we retrospectively collected 568 radiographs from 189 patients with molecularly confirmed diagnoses of seven different genetic bone disorders including achondroplasia and hypochondroplasia. A subset of the dysplastic cohort (149 images) was used to estimate the test-retest precision of our model ensemble on longitudinal data. RESULTS The mean absolute difference of Deeplasia for the RSNA test set (based on the average of six different reference ratings) and dysplastic set (based on the average of two different reference ratings) were 3.87 and 5.84 months, respectively. The test-retest precision of Deeplasia on longitudinal data (2.74 months) is estimated to be similar to a human expert. CONCLUSION We demonstrated that Deeplasia is competent in assessing the age and monitoring the development of both normal and dysplastic bones.
Collapse
|
9
|
Data Liberation and Crowdsourcing in Medical Research: The Intersection of Collective and Artificial Intelligence. Radiol Artif Intell 2024; 6:e230006. [PMID: 38231037 PMCID: PMC10831522 DOI: 10.1148/ryai.230006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/08/2023] [Accepted: 11/20/2023] [Indexed: 01/18/2024]
Abstract
In spite of an exponential increase in the volume of medical data produced globally, much of these data are inaccessible to those who might best use them to develop improved health care solutions through the application of advanced analytics such as artificial intelligence. Data liberation and crowdsourcing represent two distinct but interrelated approaches to bridging existing data silos and accelerating the pace of innovation internationally. In this article, we examine these concepts in the context of medical artificial intelligence research, summarizing their potential benefits, identifying potential pitfalls, and ultimately making a case for their expanded use going forward. A practical example of a crowdsourced competition using an international medical imaging dataset is provided. Keywords: Artificial Intelligence, Data Liberation, Crowdsourcing © RSNA, 2023.
Collapse
|
10
|
Automated bone age assessment in a German pediatric cohort: agreement between an artificial intelligence software and the manual Greulich and Pyle method. Eur Radiol 2023:10.1007/s00330-023-10543-0. [PMID: 38151536 DOI: 10.1007/s00330-023-10543-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/12/2023] [Accepted: 12/08/2023] [Indexed: 12/29/2023]
Abstract
OBJECTIVES This study aimed to evaluate the performance of artificial intelligence (AI) software in bone age (BA) assessment, according to the Greulich and Pyle (G&P) method in a German pediatric cohort. MATERIALS AND METHODS Hand radiographs of 306 pediatric patients aged 1-18 years (153 boys, 153 girls, 18 patients per year of life)-including a subgroup of patients in the age group for which the software is declared (243 patients)-were analyzed retrospectively. Two pediatric radiologists and one endocrinologist made independent blinded BA reads. Subsequently, AI software estimated BA from the same images. Both agreements, accuracy, and interchangeability between AI and expert readers were assessed. RESULTS The mean difference between the average of three expert readers and AI software was 0.39 months with a mean absolute difference (MAD) of 6.8 months (1.73 months for the mean difference and 6.0 months for MAD in the intended use subgroup). Performance in boys was slightly worse than in girls (MAD 6.3 months vs. 5.6 months). Regression analyses showed constant bias (slope of 1.01 with a 95% CI 0.99-1.02). The estimated equivalence index for interchangeability was - 14.3 (95% CI -27.6 to - 1.1). CONCLUSION In terms of BA assessment, the new AI software was interchangeable with expert readers using the G&P method. CLINICAL RELEVANCE STATEMENT The use of AI software enables every physician to provide expert reader quality in bone age assessment. KEY POINTS • A novel artificial intelligence-based software for bone age estimation has not yet been clinically validated. • Artificial intelligence showed a good agreement and high accuracy with expert radiologists performing bone age assessment. • Artificial intelligence showed to be interchangeable with expert readers.
Collapse
|
11
|
Enchondroma Detection from Hand Radiographs with an Interactive Deep Learning Segmentation Tool-A Feasibility Study. J Clin Med 2023; 12:7129. [PMID: 38002741 PMCID: PMC10672653 DOI: 10.3390/jcm12227129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 11/01/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023] Open
Abstract
Enchondromas are common benign bone tumors, usually presenting in the hand. They can cause symptoms such as swelling and pain but often go un-noticed. If the tumor expands, it can diminish the bone cortices and predispose the bone to fracture. Diagnosis is based on clinical investigation and radiographic imaging. Despite their typical appearance on radiographs, they can primarily be misdiagnosed or go totally unrecognized in the acute trauma setting. Earlier applications of deep learning models to image classification and pattern recognition suggest that this technique may also be utilized in detecting enchondroma in hand radiographs. We trained a deep learning model with 414 enchondroma radiographs to detect enchondroma from hand radiographs. A separate test set of 131 radiographs (47% with an enchondroma) was used to assess the performance of the trained deep learning model. Enchondroma annotation by three clinical experts served as our ground truth in assessing the deep learning model's performance. Our deep learning model detected 56 enchondromas from the 62 enchondroma radiographs. The area under receiver operator curve was 0.95. The F1 score for area statistical overlapping was 69.5%. Our deep learning model may be a useful tool for radiograph screening and raising suspicion of enchondroma.
Collapse
|
12
|
Bone Age Assessment Using Artificial Intelligence in Korean Pediatric Population: A Comparison of Deep-Learning Models Trained With Healthy Chronological and Greulich-Pyle Ages as Labels. Korean J Radiol 2023; 24:1151-1163. [PMID: 37899524 PMCID: PMC10613838 DOI: 10.3348/kjr.2023.0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 08/01/2023] [Accepted: 08/06/2023] [Indexed: 10/31/2023] Open
Abstract
OBJECTIVE To develop a deep-learning-based bone age prediction model optimized for Korean children and adolescents and evaluate its feasibility by comparing it with a Greulich-Pyle-based deep-learning model. MATERIALS AND METHODS A convolutional neural network was trained to predict age according to the bone development shown on a hand radiograph (bone age) using 21036 hand radiographs of Korean children and adolescents without known bone development-affecting diseases/conditions obtained between 1998 and 2019 (median age [interquartile range {IQR}], 9 [7-12] years; male:female, 11794:9242) and their chronological ages as labels (Korean model). We constructed 2 separate external datasets consisting of Korean children and adolescents with healthy bone development (Institution 1: n = 343; median age [IQR], 10 [4-15] years; male: female, 183:160; Institution 2: n = 321; median age [IQR], 9 [5-14] years; male: female, 164:157) to test the model performance. The mean absolute error (MAE), root mean square error (RMSE), and proportions of bone age predictions within 6, 12, 18, and 24 months of the reference age (chronological age) were compared between the Korean model and a commercial model (VUNO Med-BoneAge version 1.1; VUNO) trained with Greulich-Pyle-based age as the label (GP-based model). RESULTS Compared with the GP-based model, the Korean model showed a lower RMSE (11.2 vs. 13.8 months; P = 0.004) and MAE (8.2 vs. 10.5 months; P = 0.002), a higher proportion of bone age predictions within 18 months of chronological age (88.3% vs. 82.2%; P = 0.031) for Institution 1, and a lower MAE (9.5 vs. 11.0 months; P = 0.022) and higher proportion of bone age predictions within 6 months (44.5% vs. 36.4%; P = 0.044) for Institution 2. CONCLUSION The Korean model trained using the chronological ages of Korean children and adolescents without known bone development-affecting diseases/conditions as labels performed better in bone age assessment than the GP-based model in the Korean pediatric population. Further validation is required to confirm its accuracy.
Collapse
|
13
|
Estimating apparent age using artificial intelligence: Quantifying the effect of blepharoplasty. J Plast Reconstr Aesthet Surg 2023; 85:336-343. [PMID: 37543022 DOI: 10.1016/j.bjps.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 04/28/2023] [Accepted: 07/08/2023] [Indexed: 08/07/2023]
Abstract
OBJECTIVES Quantify the rejuvenation effect of blepharoplasty. METHODS A dataset of facial photographs was assembled and randomly split into 90% training and 10% validation sets. An artificial intelligence model was trained to input a facial photograph and output the apparent age of the depicted face. A retrospective chart review of patients who underwent blepharoplasty was used to assemble a test set-preoperative and postoperative photographs were culled and subsequently analyzed by the model. RESULTS A total of 47394 images of patients aged 26-89 years old were used for model training and validation. On the validation set, the model achieved 75% accuracy with a mean absolute error of 1.38 years and Pearson's r of 0.92. A total of 103 patients (29 males and 74 females) met the test set inclusion criteria (upper blepharoplasty n = 28, lower blepharoplasty n = 33, and quadrilateral blepharoplasty n = 42). The test set age ranged from 30.3 to 83.8 years old (mean 60.8, standard deviation 11.4). Overall, the model-predicted test set patients to be 0.74 years younger preoperatively versus 2.52 years younger postoperatively (p < 0.01). Significant underestimation of age was observed in women who underwent lower blepharoplasty (n = 23, 1.28 years older preoperatively vs. 2.32 years younger postoperatively, p = 3.8 × 10-4) and men who underwent quadrilateral blepharoplasty (n = 10, 0.71 years younger preoperatively vs. 5.34 years younger postoperatively, p = 0.02). CONCLUSIONS The deep learning algorithm developed in this study demonstrates that, on average, blepharoplasty provides a rejuvenating effect of approximately 2 years.
Collapse
|
14
|
The Stanford Medicine data science ecosystem for clinical and translational research. JAMIA Open 2023; 6:ooad054. [PMID: 37545984 PMCID: PMC10397535 DOI: 10.1093/jamiaopen/ooad054] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 03/14/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Objective To describe the infrastructure, tools, and services developed at Stanford Medicine to maintain its data science ecosystem and research patient data repository for clinical and translational research. Materials and Methods The data science ecosystem, dubbed the Stanford Data Science Resources (SDSR), includes infrastructure and tools to create, search, retrieve, and analyze patient data, as well as services for data deidentification, linkage, and processing to extract high-value information from healthcare IT systems. Data are made available via self-service and concierge access, on HIPAA compliant secure computing infrastructure supported by in-depth user training. Results The Stanford Medicine Research Data Repository (STARR) functions as the SDSR data integration point, and includes electronic medical records, clinical images, text, bedside monitoring data and HL7 messages. SDSR tools include tools for electronic phenotyping, cohort building, and a search engine for patient timelines. The SDSR supports patient data collection, reproducible research, and teaching using healthcare data, and facilitates industry collaborations and large-scale observational studies. Discussion Research patient data repositories and their underlying data science infrastructure are essential to realizing a learning health system and advancing the mission of academic medical centers. Challenges to maintaining the SDSR include ensuring sufficient financial support while providing researchers and clinicians with maximal access to data and digital infrastructure, balancing tool development with user training, and supporting the diverse needs of users. Conclusion Our experience maintaining the SDSR offers a case study for academic medical centers developing data science and research informatics infrastructure.
Collapse
|
15
|
"Incidentalomas" in the Age of Artificial Intelligence. J Gen Intern Med 2023; 38:2855-2856. [PMID: 37528253 PMCID: PMC10593655 DOI: 10.1007/s11606-023-08325-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
|
16
|
Artificial intelligence suppression as a strategy to mitigate artificial intelligence automation bias. J Am Med Inform Assoc 2023; 30:1684-1692. [PMID: 37561535 PMCID: PMC10531198 DOI: 10.1093/jamia/ocad118] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 05/30/2023] [Accepted: 06/19/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Incorporating artificial intelligence (AI) into clinics brings the risk of automation bias, which potentially misleads the clinician's decision-making. The purpose of this study was to propose a potential strategy to mitigate automation bias. METHODS This was a laboratory study with a randomized cross-over design. The diagnosis of anterior cruciate ligament (ACL) rupture, a common injury, on magnetic resonance imaging (MRI) was used as an example. Forty clinicians were invited to diagnose 200 ACLs with and without AI assistance. The AI's correcting and misleading (automation bias) effects on the clinicians' decision-making processes were analyzed. An ordinal logistic regression model was employed to predict the correcting and misleading probabilities of the AI. We further proposed an AI suppression strategy that retracted AI diagnoses with a higher misleading probability and provided AI diagnoses with a higher correcting probability. RESULTS The AI significantly increased clinicians' accuracy from 87.2%±13.1% to 96.4%±1.9% (P < .001). However, the clinicians' errors in the AI-assisted round were associated with automation bias, accounting for 45.5% of the total mistakes. The automation bias was found to affect clinicians of all levels of expertise. Using a logistic regression model, we identified an AI output zone with higher probability to generate misleading diagnoses. The proposed AI suppression strategy was estimated to decrease clinicians' automation bias by 41.7%. CONCLUSION Although AI improved clinicians' diagnostic performance, automation bias was a serious problem that should be addressed in clinical practice. The proposed AI suppression strategy is a practical method for decreasing automation bias.
Collapse
|
17
|
Validation of a deep learning algorithm for bone age estimation among patients in the city of São Paulo, Brazil. Radiol Bras 2023; 56:263-268. [PMID: 38204900 PMCID: PMC10775815 DOI: 10.1590/0100-3984.2023.0056-en] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 07/11/2023] [Accepted: 07/31/2023] [Indexed: 01/12/2024] Open
Abstract
Objective To validate a deep learning (DL) model for bone age estimation in individuals in the city of São Paulo, comparing it with the Greulich and Pyle method. Materials and Methods This was a cross-sectional study of hand and wrist radiographs obtained for the determination of bone age. The manual analysis was performed by an experienced radiologist. The model used was based on a convolutional neural network that placed third in the 2017 Radiological Society of North America challenge. The mean absolute error (MAE) and the root-mean-square error (RMSE) were calculated for the model versus the radiologist, with comparisons by sex, race, and age. Results The sample comprised 714 examinations. There was a correlation between the two methods, with a coefficient of determination of 0.94. The MAE of the predictions was 7.68 months, and the RMSE was 10.27 months. There were no statistically significant differences between sexes or among races (p > 0.05). The algorithm overestimated bone age in younger individuals (p = 0.001). Conclusion Our DL algorithm demonstrated potential for estimating bone age in individuals in the city of São Paulo, regardless of sex and race. However, improvements are needed, particularly in relation to its use in younger patients.
Collapse
|
18
|
Predicting Disengagement to Better Support Outcomes in a Web-Based Weight Loss Program Using Machine Learning Models: Cross-Sectional Study. J Med Internet Res 2023; 25:e43633. [PMID: 37358890 DOI: 10.2196/43633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 03/21/2023] [Accepted: 04/16/2023] [Indexed: 06/27/2023] Open
Abstract
BACKGROUND Engagement is key to interventions that achieve successful behavior change and improvements in health. There is limited literature on the application of predictive machine learning (ML) models to data from commercially available weight loss programs to predict disengagement. Such data could help participants achieve their goals. OBJECTIVE This study aimed to use explainable ML to predict the risk of member disengagement week by week over 12 weeks on a commercially available web-based weight loss program. METHODS Data were available from 59,686 adults who participated in the weight loss program between October 2014 and September 2019. Data included year of birth, sex, height, weight, motivation to join the program, use statistics (eg, weight entries, entries into the food diary, views of the menu, and program content), program type, and weight loss. Random forest, extreme gradient boosting, and logistic regression with L1 regularization models were developed and validated using a 10-fold cross-validation approach. In addition, temporal validation was performed on a test cohort of 16,947 members who participated in the program between April 2018 and September 2019, and the remaining data were used for model development. Shapley values were used to identify globally relevant features and explain individual predictions. RESULTS The average age of the participants was 49.60 (SD 12.54) years, the average starting BMI was 32.43 (SD 6.19), and 81.46% (39,594/48,604) of the participants were female. The class distributions (active and inactive members) changed from 39,369 and 9235 in week 2 to 31,602 and 17,002 in week 12, respectively. With 10-fold-cross-validation, extreme gradient boosting models had the best predictive performance, which ranged from 0.85 (95% CI 0.84-0.85) to 0.93 (95% CI 0.93-0.93) for area under the receiver operating characteristic curve and from 0.57 (95% CI 0.56-0.58) to 0.95 (95% CI 0.95-0.96) for area under the precision-recall curve (across 12 weeks of the program). They also presented a good calibration. Results obtained with temporal validation ranged from 0.51 to 0.95 for area under a precision-recall curve and 0.84 to 0.93 for area under the receiver operating characteristic curve across the 12 weeks. There was a considerable improvement in area under a precision-recall curve of 20% in week 3 of the program. On the basis of the computed Shapley values, the most important features for predicting disengagement in the following week were those related to the total activity on the platform and entering a weight in the previous weeks. CONCLUSIONS This study showed the potential of applying ML predictive algorithms to help predict and understand participants' disengagement with a web-based weight loss program. Given the association between engagement and health outcomes, these findings can prove valuable in providing better support to individuals to enhance their engagement and potentially achieve greater weight loss.
Collapse
|
19
|
AI in radiology: is it the time for randomized controlled trials? Eur Radiol 2023; 33:4223-4225. [PMID: 36597003 DOI: 10.1007/s00330-022-09381-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/17/2022] [Accepted: 11/29/2022] [Indexed: 01/05/2023]
|
20
|
High performance for bone age estimation with an artificial intelligence solution. Diagn Interv Imaging 2023:S2211-5684(23)00075-X. [PMID: 37095034 DOI: 10.1016/j.diii.2023.04.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/05/2023] [Accepted: 04/11/2023] [Indexed: 04/26/2023]
Abstract
PURPOSE The purpose of this study was to compare the performance of an artificial intelligence (AI) solution to that of a senior general radiologist for bone age assessment. MATERIAL AND METHODS Anteroposterior hand radiographs of eight boys and eight girls from each age interval between five and 17 year-old from four different radiology departments were retrospectively collected. Two board-certified pediatric radiologists with knowledge of the sex and chronological age of the patients independently estimated the Greulich and Pyle bone age to determine the standard of reference. A senior general radiologist not specialized in pediatric radiology (further referred to as "the reader") then determined the bone age with knowledge of the sex and chronological age. The results of the reader were then compared to those of the AI solution using mean absolute error (MAE) in age estimation. RESULTS The study dataset included a total of 206 patients (102 boys of mean chronological age of 10.9 ± 3.7 [SD] years, 104 girls of mean chronological age of 11 ± 3.7 [SD] years). For both sexes, the AI algorithm showed a significantly lower MAE than the reader (P < 0.007). In boys, the MAE was 0.488 years (95% confidence interval [CI]: 0.28-0.44; r2 = 0.978) for the AI algorithm and 0.771 years (95% CI: 0.64-0.90; r2 = 0.94) for the reader. In girls, the MAE was 0.494 years (95% CI: 0.41-0.56; r2 = 0.973) for the AI algorithm and 0.673 years (95% CI: 0.54-0.81; r2 = 0.934) for the reader. CONCLUSION The AI solution better estimates the Greulich and Pyle bone age than a general radiologist does.
Collapse
|
21
|
The role of patient-reported outcome measures in trials of artificial intelligence health technologies: a systematic evaluation of ClinicalTrials.gov records (1997-2022). Lancet Digit Health 2023; 5:e160-e167. [PMID: 36828608 DOI: 10.1016/s2589-7500(22)00249-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/29/2022] [Accepted: 12/07/2022] [Indexed: 02/24/2023]
Abstract
The extent to which patient-reported outcome measures (PROMs) are used in clinical trials for artificial intelligence (AI) technologies is unknown. In this systematic evaluation, we aim to establish how PROMs are being used to assess AI health technologies. We searched ClinicalTrials.gov for interventional trials registered from inception to Sept 20, 2022, and included trials that tested an AI health technology. We excluded observational studies, patient registries, and expanded access reports. We extracted data regarding the form, function, and intended use population of the AI health technology, in addition to the PROMs used and whether PROMs were incorporated as an input or output in the AI model. The search identified 2958 trials, of which 627 were included in the analysis. 152 (24%) of the included trials used one or more PROM, visual analogue scale, patient-reported experience measure, or usability measure as a trial endpoint. The type of AI health technologies used by these trials included AI-enabled smart devices, clinical decision support systems, and chatbots. The number of clinical trials of AI health technologies registered on ClinicalTrials.gov and the proportion of trials that used PROMs increased from registry inception to 2022. The most common clinical areas AI health technologies were designed for were digestive system health for non-PROM trials and musculoskeletal health (followed by mental and behavioural health) for PROM trials, with PROMs commonly used in clinical areas for which assessment of health-related quality of life and symptom burden is particularly important. Additionally, AI-enabled smart devices were the most common applications tested in trials that used at least one PROM. 24 trials tested AI models that captured PROM data as an input for the AI model. PROM use in clinical trials of AI health technologies falls behind PROM use in all clinical trials. Trial records having inadequate detail regarding the PROMs used or the type of AI health technology tested was a limitation of this systematic evaluation and might have contributed to inaccuracies in the data synthesised. Overall, the use of PROMs in the function and assessment of AI health technologies is not only possible, but is a powerful way of showing that, even in the most technologically advanced health-care systems, patients' perspectives remain central.
Collapse
|
22
|
Online software Boneureka assessing bone age based on metacarpal length in healthy children: proof-of-concept study. Pediatr Radiol 2023; 53:1100-1107. [PMID: 36853377 DOI: 10.1007/s00247-023-05595-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/23/2022] [Accepted: 01/10/2023] [Indexed: 03/01/2023]
Abstract
BACKGROUND Bone age in children is mainly assessed using the Greulich and Pyle (GP) atlas, a validated method with limited interobserver accuracy. While automated methods increase interobserver accuracy, they represent considerable costs and technical requirements. OBJECTIVE A proof-of-concept study to create and evaluate an online software program, Boneureka©, based on linear metacarpal length measurements, to assess bone age in healthy children. MATERIALS AND METHODS The study retrospectively included 434 consecutive children (215 girls) who underwent a left-hand radiograph to rule out trauma between March 2008 and December 2017. Two reviewers measured the second to fourth metacarpal lengths on each radiograph and the distance between the centre of the epiphyses of the second and fifth metacarpals. A single reviewer estimated the bone age using the GP atlas. The automated software assessed the bone age for all radiographs. A mathematical model was developed based on linear regressions to provide the mean bone age and standard deviation based on the estimates. Pearson and intraclass correlation coefficient (ICC) were used to evaluate the correlation and agreement between the estimated bone ages using Boneureka©, the GP atlas and BoneXpert® compared to chronological age. RESULTS The measure that showed the highest correlation (r2=0.877 for girls and r2=0.834 for boys; P<.001) and the highest ICC (ICC=0.937 for girls and ICC=0.926 for boys; P<0.001) with chronological age was length of the second metacarpal. The GP atlas and the automated software evaluation had excellent ICC with chronological age (ICC>0.95 for both methods and sexes). Using this data, we created an online software program based on the second metacarpal length to obtain bone age estimates, means and standard deviations. CONCLUSION The newly created online software Boneureka,© based on the second metacarpal length, is a reliable and user-friendly tool to assess bone age in healthy children. Further studies on a larger population should be performed to validate the developed reference values.
Collapse
|
23
|
Generalizability and Bias in a Deep Learning Pediatric Bone Age Prediction Model Using Hand Radiographs. Radiology 2023; 306:e220505. [PMID: 36165796 DOI: 10.1148/radiol.220505] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Background Although deep learning (DL) models have demonstrated expert-level ability for pediatric bone age prediction, they have shown poor generalizability and bias in other use cases. Purpose To quantify generalizability and bias in a bone age DL model measured by performance on external versus internal test sets and performance differences between different demographic groups, respectively. Materials and Methods The winning DL model of the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated and trained on 12 611 pediatric hand radiographs from two U.S. hospitals. The DL model was tested from September 2021 to December 2021 on an internal validation set and an external test set of pediatric hand radiographs with diverse demographic representation. Images reporting ground-truth bone age were included for study. Mean absolute difference (MAD) between ground-truth bone age and the model prediction bone age was calculated for each set. Generalizability was evaluated by comparing MAD between internal and external evaluation sets with use of t tests. Bias was evaluated by comparing MAD and clinically significant error rate (rate of errors changing the clinical diagnosis) between demographic groups with use of t tests or analysis of variance and χ2 tests, respectively (statistically significant difference defined as P < .05). Results The internal validation set had images from 1425 individuals (773 boys), and the external test set had images from 1202 individuals (mean age, 133 months ± 60 [SD]; 614 boys). The bone age model generalized well to the external test set, with no difference in MAD (6.8 months in the validation set vs 6.9 months in the external set; P = .64). Model predictions would have led to clinically significant errors in 194 of 1202 images (16%) in the external test set. The MAD was greater for girls than boys in the internal validation set (P = .01) and in the subcategories of age and Tanner stage in the external test set (P < .001 for both). Conclusion A deep learning (DL) bone age model generalized well to an external test set, although clinically significant sex-, age-, and sexual maturity-based biases in DL bone age were identified. © RSNA, 2022 Online supplemental material is available for this article See also the editorial by Larson in this issue.
Collapse
|
24
|
Detecting pediatric wrist fractures using deep-learning-based object detection. Pediatr Radiol 2023; 53:1125-1134. [PMID: 36650360 DOI: 10.1007/s00247-023-05588-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 12/09/2022] [Accepted: 12/30/2022] [Indexed: 01/19/2023]
Abstract
BACKGROUND Missed fractures are the leading cause of diagnostic error in the emergency department, and fractures of pediatric bones, particularly subtle wrist fractures, can be misidentified because of their varying characteristics and responses to injury. OBJECTIVE This study evaluated the utility of an object detection deep learning framework for classifying pediatric wrist fractures as positive or negative for fracture, including subtle buckle fractures of the distal radius, and evaluated the performance of this algorithm as augmentation to trainee radiograph interpretation. MATERIALS AND METHODS We obtained 395 posteroanterior wrist radiographs from unique pediatric patients (65% positive for fracture, 30% positive for distal radial buckle fracture) and divided them into train (n = 229), tune (n = 41) and test (n = 125) sets. We trained a Faster R-CNN (region-based convolutional neural network) deep learning object-detection model. Two pediatric and two radiology residents evaluated radiographs initially without the artificial intelligence (AI) assistance, and then subsequently with access to the bounding box generated by the Faster R-CNN model. RESULTS The Faster R-CNN model demonstrated an area under the curve (AUC) of 0.92 (95% confidence interval [CI] 0.87-0.97), accuracy of 88% (n = 110/125; 95% CI 81-93%), sensitivity of 88% (n = 70/80; 95% CI 78-94%) and specificity of 89% (n = 40/45, 95% CI 76-96%) in identifying any fracture and identified 90% of buckle fractures (n = 35/39, 95% CI 76-97%). Access to Faster R-CNN model predictions significantly improved average resident accuracy from 80 to 93% in detecting any fracture (P < 0.001) and from 69 to 92% in detecting buckle fracture (P < 0.001). After accessing AI predictions, residents significantly outperformed AI in cases of disagreement (73% resident correct vs. 27% AI, P = 0.002). CONCLUSION An object-detection-based deep learning approach trained with only a few hundred examples identified radiographs containing pediatric wrist fractures with high accuracy. Access to model predictions significantly improved resident accuracy in diagnosing these fractures.
Collapse
|
25
|
Application of artificial intelligence to imaging interpretations in the musculoskeletal area: Where are we? Where are we going? Joint Bone Spine 2023; 90:105493. [PMID: 36423783 DOI: 10.1016/j.jbspin.2022.105493] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/30/2022] [Accepted: 11/02/2022] [Indexed: 11/23/2022]
Abstract
The interest of researchers, clinicians and radiologists, in artificial intelligence (AI) continues to grow. Deep learning is a subset of machine learning, in which the computer algorithm itself can determine the optimal imaging features to answer a clinical question. Convolutional neural networks are the most common architecture for performing deep learning on medical images. The various musculoskeletal applications of deep learning are the detection of abnormalities on X-rays or cross-sectional images (CT, MRI), for example the detection of fractures, meniscal tears, anterior cruciate ligament tears, degenerative lesions of the spine, bone metastases, classification of e.g., dural sac stenosis, degeneration of intervertebral discs, assessment of skeletal age, and segmentation, for example of cartilage. Software developments are already impacting the daily practice of orthopedic imaging by automatically detecting fractures on radiographs. Improving image acquisition protocols, improving the quality of low-dose CT images, reducing acquisition times in MRI, or improving MR image resolution is possible through deep learning. Deep learning offers an automated way to offload time-consuming manual processes and improve practitioner performance. This article reviews the current state of AI in musculoskeletal imaging.
Collapse
|
26
|
Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis. Radiology 2023; 306:20-31. [PMID: 36346314 DOI: 10.1148/radiol.220182] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Adequate clinical evaluation of artificial intelligence (AI) algorithms before adoption in practice is critical. Clinical evaluation aims to confirm acceptable AI performance through adequate external testing and confirm the benefits of AI-assisted care compared with conventional care through appropriately designed and conducted studies, for which prospective studies are desirable. This article explains some of the fundamental methodological points that should be considered when designing and appraising the clinical evaluation of AI algorithms for medical diagnosis. The specific topics addressed include the following: (a) the importance of external testing of AI algorithms and strategies for conducting the external testing effectively, (b) the various metrics and graphical methods for evaluating the AI performance as well as essential methodological points to note in using and interpreting them, (c) paired study designs primarily for comparative performance evaluation of conventional and AI-assisted diagnoses, (d) parallel study designs primarily for evaluating the effect of AI intervention with an emphasis on randomized clinical trials, and (e) up-to-date guidelines for reporting clinical studies on AI, with an emphasis on guidelines registered in the EQUATOR Network library. Sound methodological knowledge of these topics will aid the design, execution, reporting, and appraisal of clinical evaluation of AI.
Collapse
|
27
|
A Comparison of 2 Abbreviated Methods for Assessing Adolescent Bone Age: The Shorthand Bone Age Method and the SickKids/Columbia Method. J Pediatr Orthop 2023; 43:e80-e85. [PMID: 36155388 DOI: 10.1097/bpo.0000000000002269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Radiographic assessment of bone age is critically important to decision-making on the type and timing of operative interventions in pediatric orthopaedics. The current widely accepted method for determining bone age is time and resource-intensive. This study sought to assess the reliability and accuracy of 2 abbreviated methods, the Shorthand Bone Age (SBA) and the SickKids/Columbia (SKC) methods, to the widely accepted Greulich and Pyle (GP) method. METHODS Standard posteroanterior radiographs of the left hand of 125 adolescent males and 125 adolescent females were compiled, with bone ages determined by the GP method ranging from 9 to 16 years for males and 8 to 14 years for females. Blinded to the chronologic age and GP bone age of each child, the bone age for each radiograph was determined using the SBA and SKC methods by an orthopaedic surgery resident, 2 pediatric orthopaedic surgeons, and a musculoskeletal radiologist. Measurements were then repeated 2 weeks later after rerandomization of the radiographs. Intrarater and interrater reliability for the 2 abbreviated methods as well as the agreement between all 3 methods were calculated using weighted κ values. Mean absolute differences between methods were also calculated. RESULTS Both bone age methods demonstrated substantial to almost perfect intrarater reliability, with a weighted κ ranging from 0.79 to 0.93 for the SBA method and from 0.82 to 0.96 for the SKC method. Interrater reliability was moderate to substantial (weighted κ: 0.55 to 0.84) for the SBA method and substantial to almost perfect (weighted κ: 0.67 to 0.92) for the SKC method. Agreement between the 3 methods was substantial for all raters and all comparisons. The mean absolute difference, been GP-derived and SBA-derived bone age, was 7.6±7.8 months, as compared with 8.8±7.4 months between GP-derived and SKC-derived bone ages. CONCLUSIONS The SBA and SKC methods have comparable reliability, and both correlate well to the widely accepted GP methods and to each other. However, they have relatively large absolute differences when compared with the GP method. These methods offer simple, efficient, and affordable estimates for bone age determination, but at best provide an estimate to be used in the appropriate setting. LEVEL OF EVIDENCE Diagnostic study-level III.
Collapse
|
28
|
Deep learning of birth-related infant clavicle fractures: a potential virtual consultant for fracture dating. Pediatr Radiol 2022; 52:2206-2214. [PMID: 35578043 DOI: 10.1007/s00247-022-05380-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 03/08/2022] [Accepted: 04/13/2022] [Indexed: 02/06/2023]
Abstract
BACKGROUND In infant abuse investigations, dating of skeletal injuries from radiographs is desirable to reach a clear timeline of traumatic events. Prior studies have used infant birth-related clavicle fractures as a surrogate to develop a framework for dating of abuse-related fractures. OBJECTIVE To develop and train a deep learning algorithm that can accurately date infant birth-related clavicle fractures. MATERIALS AND METHODS We modified a deep learning model initially designed for face-age estimation to date infant clavicle fractures. We conducted a computerized search of imaging reports and other medical records at a tertiary children's hospital to identify radiographs of birth-related clavicle fracture in infants ≤ 3 months old (July 2003 to March 2021). We used the resultant database for model training, validation and testing. We evaluated the performance of the deep learning model via a four-fold cross-validation procedure, and calculated accuracy metrics: mean absolute error (MAE), root mean square error (RMSE), intraclass correlation coefficient (ICC) and cumulative score. RESULTS The curated database consisted of 416 clavicle radiographs from 213 infants. Average chronological age (equivalent to fracture age) at time of imaging was 24 days. This model estimated the ages of the clavicle fractures with MAE of 4.2 days, RMSE of 6.3 days and ICC of 0.919. On average, 83.7% of the fracture age estimates were accurate to within 7 days of the ground truth. CONCLUSION Our deep learning study provides encouraging results for radiographic dating of infant clavicle fractures. With further development and validation, this model might serve as a virtual consultant to radiologists estimating fracture ages in cases of suspected infant abuse.
Collapse
|
29
|
Data governance functions to support responsible data stewardship in pediatric radiology research studies using artificial intelligence. Pediatr Radiol 2022; 52:2111-2119. [PMID: 35790559 DOI: 10.1007/s00247-022-05427-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 04/13/2022] [Accepted: 06/06/2022] [Indexed: 03/03/2023]
Abstract
The integration of human and machine intelligence promises to profoundly change the practice of medicine. The rapidly increasing adoption of artificial intelligence (AI) solutions highlights its potential to streamline physician work and optimize clinical decision-making, also in the field of pediatric radiology. Large imaging databases are necessary for training, validating and testing these algorithms. To better promote data accessibility in multi-institutional AI-enabled radiologic research, these databases centralize the large volumes of data required to effect accurate models and outcome predictions. However, such undertakings must consider the sensitivity of patient information and therefore utilize requisite data governance measures to safeguard data privacy and security, to recognize and mitigate the effects of bias and to promote ethical use. In this article we define data stewardship and data governance, review their key considerations and applicability to radiologic research in the pediatric context, and consider the associated best practices along with the ramifications of poorly executed data governance. We summarize several adaptable data governance frameworks and describe strategies for their implementation in the form of distributed and centralized approaches to data management.
Collapse
|
30
|
The Challenges of Regulating Artificial Intelligence in Healthcare Comment on "Clinical Decision Support and New Regulatory Frameworks for Medical Devices: Are We Ready for It? - A Viewpoint Paper". Int J Health Policy Manag 2022; 12:7261. [PMID: 36243948 PMCID: PMC10125205 DOI: 10.34172/ijhpm.2022.7261] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 09/07/2022] [Indexed: 11/07/2022] Open
Abstract
Regulation of health technologies must be rigorous, instilling trust among both healthcare providers and patients. This is especially important for the control and supervision of the growing use of artificial intelligence in healthcare. In this commentary on the accompanying piece by Van Laere and colleagues, we set out the scope for applying artificial intelligence in the healthcare sector and outline five key challenges that regulators face in dealing with these modern-day technologies. Addressing these challenges will not be easy. While artificial intelligence applications in healthcare have already made rapid progress and benefitted patients, these applications clearly hold even more potential for future developments. Yet it is vital that the regulatory environment keep up with this fast-evolving space of healthcare in order to anticipate and, to the extent possible, prevent the risks that may arise.
Collapse
|
31
|
Abstract
IMPORTANCE Despite the potential of machine learning to improve multiple aspects of patient care, barriers to clinical adoption remain. Randomized clinical trials (RCTs) are often a prerequisite to large-scale clinical adoption of an intervention, and important questions remain regarding how machine learning interventions are being incorporated into clinical trials in health care. OBJECTIVE To systematically examine the design, reporting standards, risk of bias, and inclusivity of RCTs for medical machine learning interventions. EVIDENCE REVIEW In this systematic review, the Cochrane Library, Google Scholar, Ovid Embase, Ovid MEDLINE, PubMed, Scopus, and Web of Science Core Collection online databases were searched and citation chasing was done to find relevant articles published from the inception of each database to October 15, 2021. Search terms for machine learning, clinical decision-making, and RCTs were used. Exclusion criteria included implementation of a non-RCT design, absence of original data, and evaluation of nonclinical interventions. Data were extracted from published articles. Trial characteristics, including primary intervention, demographics, adherence to the CONSORT-AI reporting guideline, and Cochrane risk of bias were analyzed. FINDINGS Literature search yielded 19 737 articles, of which 41 RCTs involved a median of 294 participants (range, 17-2488 participants). A total of 16 RCTS (39%) were published in 2021, 21 (51%) were conducted at single sites, and 15 (37%) involved endoscopy. No trials adhered to all CONSORT-AI standards. Common reasons for nonadherence were not assessing poor-quality or unavailable input data (38 trials [93%]), not analyzing performance errors (38 [93%]), and not including a statement regarding code or algorithm availability (37 [90%]). Overall risk of bias was high in 7 trials (17%). Of 11 trials (27%) that reported race and ethnicity data, the median proportion of participants from underrepresented minority groups was 21% (range, 0%-51%). CONCLUSIONS AND RELEVANCE This systematic review found that despite the large number of medical machine learning-based algorithms in development, few RCTs for these technologies have been conducted. Among published RCTs, there was high variability in adherence to reporting standards and risk of bias and a lack of participants from underrepresented minority groups. These findings merit attention and should be considered in future RCT design and reporting.
Collapse
|
32
|
Inferring pediatric knee skeletal maturity from MRI using deep learning. Skeletal Radiol 2022; 51:1671-1677. [PMID: 35184211 DOI: 10.1007/s00256-022-04010-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 01/29/2022] [Accepted: 02/04/2022] [Indexed: 02/02/2023]
Abstract
PURPOSE Many children who undergo MR of the knee to evaluate traumatic injury may not undergo a separate dedicated evaluation of their skeletal maturity, and we wished to investigate how accurately skeletal maturity could be automatically inferred from knee MRI using deep learning to offer this additional information to clinicians. MATERIALS AND METHODS Retrospective data from 894 studies from 783 patients were obtained (mean age 13.1 years, 47% female). Coronal and sagittal sequences that were T1/PD-weighted were included and resized to 224 × 224 pixels. Data were divided into train (n = 673), tune (n = 48), and test (n = 173) sets, and children were separated across sets. The chronologic age was predicted using deep learning approaches based on a long short-term memory (LSTM) model, which took as input DenseNet-121-extracted features from all T1/PD coronal and sagittal slices. Each test case was manually assigned a bone age by two radiology residents using a reference atlas provided by Pennock and Bomar. The patient's age served as ground truth. RESULTS The error of the model's predictions for chronological age was not significantly different from that of radiology residents (model M.S.E. 1.30 vs. resident 0.99, paired t-test = 1.47, p = 0.14). Pearson correlation between model and resident prediction of chronologic age was 0.96 (p < 0.001). CONCLUSION A deep learning-based approach demonstrated ability to infer skeletal maturity from knee MR sequences that was not significantly different from resident performance and did so in less than 2% of the time required by a human expert. This may offer a method for automatically evaluating lower extremity skeletal maturity automatically as part of every MR examination.
Collapse
|
33
|
Abstract
Background Lumbar spine MRI studies are widely used for back pain assessment. Interpretation involves grading lumbar spinal stenosis, which is repetitive and time consuming. Deep learning (DL) could provide faster and more consistent interpretation. Purpose To assess the speed and interobserver agreement of radiologists for reporting lumbar spinal stenosis with and without DL assistance. Materials and Methods In this retrospective study, a DL model designed to assist radiologists in the interpretation of spinal canal, lateral recess, and neural foraminal stenoses on lumbar spine MRI scans was used. Randomly selected lumbar spine MRI studies obtained in patients with back pain who were 18 years and older over a 3-year period, from September 2015 to September 2018, were included in an internal test data set. Studies with instrumentation and scoliosis were excluded. Eight radiologists, each with 2-13 years of experience in spine MRI interpretation, reviewed studies with and without DL model assistance with a 1-month washout period. Time to diagnosis (in seconds) and interobserver agreement (using Gwet κ) were assessed for stenosis grading for each radiologist with and without the DL model and compared with test data set labels provided by an external musculoskeletal radiologist (with 32 years of experience) as the reference standard. Results Overall, 444 images in 25 patients (mean age, 51 years ± 20 [SD]; 14 women) were evaluated in a test data set. DL-assisted radiologists had a reduced interpretation time per spine MRI study, from a mean of 124-274 seconds (SD, 25-88 seconds) to 47-71 seconds (SD, 24-29 seconds) (P < .001). DL-assisted radiologists had either superior or equivalent interobserver agreement for all stenosis gradings compared with unassisted radiologists. DL-assisted general and in-training radiologists improved their interobserver agreement for four-class neural foraminal stenosis, with κ values of 0.71 and 0.70 (with DL) versus 0.39 and 0.39 (without DL), respectively (both P < .001). Conclusion Radiologists who were assisted by deep learning for interpretation of lumbar spinal stenosis on MRI scans showed a marked reduction in reporting time and superior or equivalent interobserver agreement for all stenosis gradings compared with radiologists who were unassisted by deep learning. © RSNA, 2022 Online supplemental material is available for this article. See also the editorial by Hayashi in this issue.
Collapse
|
34
|
AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health 2022; 4:e406-e414. [PMID: 35568690 PMCID: PMC9650160 DOI: 10.1016/s2589-7500(22)00063-2] [Citation(s) in RCA: 106] [Impact Index Per Article: 53.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 03/03/2022] [Accepted: 03/18/2022] [Indexed: 02/01/2023]
Abstract
BACKGROUND Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. METHODS Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. FINDINGS In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91-0·99], CT chest imaging [0·87-0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. INTERPRETATION The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging. FUNDING National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology.
Collapse
|
35
|
Detecting total hip arthroplasty dislocations using deep learning: clinical and Internet validation. Emerg Radiol 2022; 29:801-808. [PMID: 35608786 DOI: 10.1007/s10140-022-02060-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/12/2022] [Indexed: 10/18/2022]
Abstract
OBJECTIVE Periprosthetic dislocations of total hip arthroplasty (THA) are time-sensitive injuries, as the longer diagnosis and treatment are delayed, the more difficult they are to reduce. Automated triage of radiographs with dislocations could help reduce these delays. We trained convolutional neural networks (CNNs) for the detection of THA dislocations, and evaluated their generalizability by evaluating them on external datasets. METHODS We used 357 THA radiographs from a single hospital (185 with dislocation [51.8%]) to develop and internally test a variety of CNNs to identify THA dislocation. We performed external testing of these CNNs on two datasets to evaluate generalizability. CNN performance was evaluated using area under the receiving operating characteristic curve (AUROC). Class activation mapping (CAM) was used to create heatmaps of test images for visualization of regions emphasized by the CNNs. RESULTS Multiple CNNs achieved AUCs of 1 for both internal and external test sets, indicating good generalizability. Heatmaps showed that CNNs consistently emphasized the THA for both dislocated and located THAs. CONCLUSION CNNs can be trained to recognize THA dislocation with high diagnostic performance, which supports their potential use for triage in the emergency department. Importantly, our CNNs generalized well to external data from two sources, further supporting their potential clinical utility.
Collapse
|
36
|
A Systematic Review of Deep Learning Applications for Optical Coherence Tomography in Age-Related Macular Degeneration. Retina 2022; 42:1417-1424. [DOI: 10.1097/iae.0000000000003535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
37
|
Machine Learning for Hepatocellular Carcinoma Segmentation at MRI: Radiology In Training. Radiology 2022; 304:509-515. [PMID: 35536132 DOI: 10.1148/radiol.212386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A 68-year-old woman with a history of hepatocellular carcinoma underwent conventional transarterial chemoembolization. Manual tumor segmentation on images, which can be used to assess disease progression, is time consuming and may suffer from interobserver reliability issues. The authors present a how-to guide to develop machine learning algorithms for fully automatic segmentation of hepatocellular carcinoma and other tumors for lesion tracking over time.
Collapse
|
38
|
Accuracy and self-validation of automated bone age determination. Sci Rep 2022; 12:6388. [PMID: 35430607 PMCID: PMC9013398 DOI: 10.1038/s41598-022-10292-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 03/29/2022] [Indexed: 11/20/2022] Open
Abstract
The BoneXpert method for automated determination of bone age from hand X-rays was introduced in 2009 and is currently running in over 200 hospitals. The aim of this work is to present version 3 of the method and validate its accuracy and self-validation mechanism that automatically rejects an image if it is at risk of being analysed incorrectly. The training set included 14,036 images from the 2017 Radiological Society of North America (RSNA) Bone Age Challenge, 1642 images of normal Dutch and Californian children, and 8250 images from Tübingen from patients with Short Stature, Congenital Adrenal Hyperplasia and Precocious Puberty. The study resulted in a cross-validated root mean square (RMS) error in the Tübingen images of 0.62 y, compared to 0.72 y in the previous version. The RMS error on the RSNA test set of 200 images was 0.45 y relative to the average of six manual ratings. The self-validation mechanism rejected 0.4% of the RSNA images. 121 outliers among the self-validated images of the Tübingen study were rerated, resulting in 6 cases where BoneXpert deviated more than 1.5 years from the average of the three re-ratings, compared to 72 such cases for the original manual ratings. The accuracy of BoneXpert is clearly better than the accuracy of a single manual rating. The self-validation mechanism rejected very few images, typically with abnormal anatomy, and among the accepted images, there were 12 times fewer severe bone age errors than in manual ratings, suggesting that BoneXpert could be safer than manual rating.
Collapse
|
39
|
Radiomics and Artificial Intelligence: From Academia to Clinical Practice. Radiology 2022; 303:542-543. [PMID: 35230192 DOI: 10.1148/radiol.220081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
40
|
Assessing Bone Age: A Paradigm for the Next Generation of Artificial Intelligence in Radiology. Radiology 2021; 301:700-701. [PMID: 34581631 DOI: 10.1148/radiol.2021211339] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|