1
|
Diagnosis of pulmonary tuberculosis with 3D neural network based on multi-scale attention mechanism. Med Biol Eng Comput 2024; 62:1589-1600. [PMID: 38319503 DOI: 10.1007/s11517-024-03022-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/03/2024] [Indexed: 02/07/2024]
Abstract
This paper presents a novel multi-scale attention residual network (MAResNet) for diagnosing patients with pulmonary tuberculosis (PTB) by computed tomography (CT) images. First, a three-dimensional (3D) network structure is applied in MAResNet based on the continuity and correlation of nodal features on different slices of CT images. Secondly, MAResNet incorporates the residual module and Convolutional Block Attention Module (CBAM) to reuse the shallow features of CT images and focus on key features to enhance the feature distinguishability of images. In addition, multi-scale inputs can increase the global receptive field of the network, extract the location information of PTB, and capture the local details of nodules. The expression ability of both high-level and low-level semantic information in the network can also be enhanced. The proposed MAResNet shows excellent results, with overall 94% accuracy in PTB classification. MAResNet based on 3D CT images can assist doctors make more accurate diagnosis of PTB and alleviate the burden of manual screening. In the experiment, a called Grad-CAM was employed to enhance the class activation mapping (CAM) technique for analyzing the model's output, which can identify lesions in important parts of the lungs and make transparent decisions.
Collapse
|
2
|
Reviewing 3D convolutional neural network approaches for medical image segmentation. Heliyon 2024; 10:e27398. [PMID: 38496891 PMCID: PMC10944240 DOI: 10.1016/j.heliyon.2024.e27398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 03/19/2024] Open
Abstract
Background Convolutional neural networks (CNNs) assume pivotal roles in aiding clinicians in diagnosis and treatment decisions. The rapid evolution of imaging technology has established three-dimensional (3D) CNNs as a formidable framework for delineating organs and anomalies in medical images. The prominence of 3D CNN frameworks is steadily growing within medical image segmentation and classification. Thus, our proposition entails a comprehensive review, encapsulating diverse 3D CNN algorithms for the segmentation of medical image anomalies and organs. Methods This study systematically presents an exhaustive review of recent 3D CNN methodologies. Rigorous screening of abstracts and titles were carried out to establish their relevance. Research papers disseminated across academic repositories were meticulously chosen, analyzed, and appraised against specific criteria. Insights into the realm of anomalies and organ segmentation were derived, encompassing details such as network architecture and achieved accuracies. Results This paper offers an all-encompassing analysis, unveiling the prevailing trends in 3D CNN segmentation. In-depth elucidations encompass essential insights, constraints, observations, and avenues for future exploration. A discerning examination indicates the preponderance of the encoder-decoder network in segmentation tasks. The encoder-decoder framework affords a coherent methodology for the segmentation of medical images. Conclusion The findings of this study are poised to find application in clinical diagnosis and therapeutic interventions. Despite inherent limitations, CNN algorithms showcase commendable accuracy levels, solidifying their potential in medical image segmentation and classification endeavors.
Collapse
|
3
|
ELTS-Net: An enhanced liver tumor segmentation network with augmented receptive field and global contextual information. Comput Biol Med 2024; 169:107879. [PMID: 38142549 DOI: 10.1016/j.compbiomed.2023.107879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 11/30/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The liver is one of the organs with the highest incidence rate in the human body, and late-stage liver cancer is basically incurable. Therefore, early diagnosis and lesion location of liver cancer are of important clinical value. This study proposes an enhanced network architecture ELTS-Net based on the 3D U-Net model, to address the limitations of conventional image segmentation methods and the underutilization of image spatial features by the 2D U-Net network structure. ELTS-Net expands upon the original network by incorporating dilated convolutions to increase the receptive field of the convolutional kernel. Additionally, an attention residual module, comprising an attention mechanism and residual connections, replaces the original convolutional module, serving as the primary components of the encoder and decoder. This design enables the network to capture contextual information globally in both channel and spatial dimensions. Furthermore, deep supervision modules are integrated between different levels of the decoder network, providing additional feedback from deeper intermediate layers. This constrains the network weights to the target regions and optimizing segmentation results. Evaluation on the LiTS2017 dataset shows improvements in evaluation metrics for liver and tumor segmentation tasks compared to the baseline 3D U-Net model, achieving 95.2% liver segmentation accuracy and 71.9% tumor segmentation accuracy, with accuracy improvements of 0.9% and 3.1% respectively. The experimental results validate the superior segmentation performance of ELTS-Net compared to other comparison models, offering valuable guidance for clinical diagnosis and treatment.
Collapse
|
4
|
Using a generative adversarial network to generate synthetic MRI images for multi-class automatic segmentation of brain tumors. FRONTIERS IN RADIOLOGY 2024; 3:1336902. [PMID: 38304344 PMCID: PMC10830800 DOI: 10.3389/fradi.2023.1336902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 12/28/2023] [Indexed: 02/03/2024]
Abstract
Challenging tasks such as lesion segmentation, classification, and analysis for the assessment of disease progression can be automatically achieved using deep learning (DL)-based algorithms. DL techniques such as 3D convolutional neural networks are trained using heterogeneous volumetric imaging data such as MRI, CT, and PET, among others. However, DL-based methods are usually only applicable in the presence of the desired number of inputs. In the absence of one of the required inputs, the method cannot be used. By implementing a generative adversarial network (GAN), we aim to apply multi-label automatic segmentation of brain tumors to synthetic images when not all inputs are present. The implemented GAN is based on the Pix2Pix architecture and has been extended to a 3D framework named Pix2PixNIfTI. For this study, 1,251 patients of the BraTS2021 dataset comprising sequences such as T1w, T2w, T1CE, and FLAIR images equipped with respective multi-label segmentation were used. This dataset was used for training the Pix2PixNIfTI model for generating synthetic MRI images of all the image contrasts. The segmentation model, namely DeepMedic, was trained in a five-fold cross-validation manner for brain tumor segmentation and tested using the original inputs as the gold standard. The inference of trained segmentation models was later applied to synthetic images replacing missing input, in combination with other original images to identify the efficacy of generated images in achieving multi-class segmentation. For the multi-class segmentation using synthetic data or lesser inputs, the dice scores were observed to be significantly reduced but remained similar in range for the whole tumor when compared with evaluated original image segmentation (e.g. mean dice of synthetic T2w prediction NC, 0.74 ± 0.30; ED, 0.81 ± 0.15; CET, 0.84 ± 0.21; WT, 0.90 ± 0.08). A standard paired t-tests with multiple comparison correction were performed to assess the difference between all regions (p < 0.05). The study concludes that the use of Pix2PixNIfTI allows us to segment brain tumors when one input image is missing.
Collapse
|
5
|
Wireless capsule endoscopy multiclass classification using three-dimensional deep convolutional neural network model. Biomed Eng Online 2023; 22:124. [PMID: 38098015 PMCID: PMC10722702 DOI: 10.1186/s12938-023-01186-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/29/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Wireless capsule endoscopy (WCE) is a patient-friendly and non-invasive technology that scans the whole of the gastrointestinal tract, including difficult-to-access regions like the small bowel. Major drawback of this technology is that the visual inspection of a large number of video frames produced during each examination makes the physician diagnosis process tedious and prone to error. Several computer-aided diagnosis (CAD) systems, such as deep network models, have been developed for the automatic recognition of abnormalities in WCE frames. Nevertheless, most of these studies have only focused on spatial information within individual WCE frames, missing the crucial temporal data within consecutive frames. METHODS In this article, an automatic multiclass classification system based on a three-dimensional deep convolutional neural network (3D-CNN) is proposed, which utilizes the spatiotemporal information to facilitate the WCE diagnosis process. The 3D-CNN model fed with a series of sequential WCE frames in contrast to the two-dimensional (2D) model, which exploits frames as independent ones. Moreover, the proposed 3D deep model is compared with some pre-trained networks. The proposed models are trained and evaluated with 29 subject WCE videos (14,691 frames before augmentation). The performance advantages of 3D-CNN over 2D-CNN and pre-trained networks are verified in terms of sensitivity, specificity, and accuracy. RESULTS 3D-CNN outperforms the 2D technique in all evaluation metrics (sensitivity: 98.92 vs. 98.05, specificity: 99.50 vs. 86.94, accuracy: 99.20 vs. 92.60). In conclusion, a novel 3D-CNN model for lesion detection in WCE frames is proposed in this study. CONCLUSION The results indicate the performance of 3D-CNN over 2D-CNN and some well-known pre-trained classifier networks. The proposed 3D-CNN model uses the rich temporal information in adjacent frames as well as spatial data to develop an accurate and efficient model.
Collapse
|
6
|
The OCDA-Net: a 3D convolutional neural network-based system for classification and staging of ovarian cancer patients using [ 18F]FDG PET/CT examinations. Ann Nucl Med 2023; 37:645-654. [PMID: 37768493 DOI: 10.1007/s12149-023-01867-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
OBJECTIVE To create the 3D convolutional neural network (CNN)-based system that can use whole-body [18F]FDG PET for recurrence/post-therapy surveillance in ovarian cancer (OC). METHODS In this study, 1224 image sets from OC patients who underwent whole-body [18F]FDG PET/CT at Kowsar Hospital between April 2019 and May 2022 were investigated. For recurrence/post-therapy surveillance, diagnostic classification as cancerous, and non-cancerous and staging as stage III, and stage IV were determined by pathological diagnosis and specialists' interpretation. New deep neural network algorithms, the OCDAc-Net, and the OCDAs-Net were developed for diagnostic classification and staging of OC patients using [18F]FDG PET/CT images. Examinations were divided into independent training (75%), validation (10%), and testing (15%) subsets. RESULTS This study included 37 women (mean age 56.3 years; age range 36-83 years). Data augmentation techniques were applied to the images in two phases. There were 1224 image sets for diagnostic classification and staging. For the test set, 170 image sets were considered for diagnostic classification and staging. The OCDAc-Net areas under the receiver operating characteristic curve (AUCs) and overall accuracy for diagnostic classification were 0.990 and 0.92, respectively. The OCDAs-Net achieved areas under the receiver operating characteristic curve (AUCs) of 0.995 and overall accuracy of 0.94 for staging. CONCLUSIONS The proposed 3D CNN-based models provide potential tools for recurrence/post-therapy surveillance in OC. The OCDAc-Net and the OCDAs-Net model provide a new prognostic analysis method that can utilize PET images without pathological findings for diagnostic classification and staging.
Collapse
|
7
|
Automated intracranial hemorrhage detection in traumatic brain injury using 3D CNN. J Neurosci Rural Pract 2023; 14:615-621. [PMID: 38059235 PMCID: PMC10696364 DOI: 10.25259/jnrp_172_2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/25/2023] [Indexed: 12/08/2023] Open
Abstract
Objectives Intracranial hemorrhage (ICH) is a prevalent and potentially fatal consequence of traumatic brain injury (TBI). Timely identification of ICH is crucial to ensure timely intervention and to optimize better patient outcomes. However, the current methods for diagnosing ICH from head computed tomography (CT) scans require skilled personnel (Radiologists and/or Neurosurgeons) who may be unavailable in all centers, especially in rural areas. The aim of this study is to develop a neurotrauma screening tool for identifying ICH from head CT scans of TBI patients. Materials and Methods We prospectively collected head CT scans from the Department of Neurosurgery, All India Institute of Medical Sciences, New Delhi. Approximately 738 consecutive head CT scans from patients enrolled in the department were collected for this study spanning a duration of 9 months, that is, January 2020 to September 2020. The metadata collected along with the head CT scans consisted of demographic and clinical details and the radiologist's report which was used as the gold standard. A deep learning-based 3D convolutional neural network (CNN) model was trained on the dataset. The pre-processing, hyperparameters, and augmentation were common for training the 3D CNN model whereas the training modules were set differently. The model was trained along with the save best model option and was monitored by validation metrics. The Institute Ethics Committee permission was taken before starting the study. Results We developed a 3D CNN model for automatically detecting the ICH from head CT scans. The screening tool was tested in 20 cases and trained on 200 head CT scans, with 99 normal head CT and 101 CT scans with some type of ICH. The final model performed with 90% sensitivity, 70% specificity, and 80% accuracy. Conclusion Our study reveals that the automated screening tool exhibits a commendable level of accuracy and sensitivity in detecting ICH from the head CT scans. The results indicate that the 3D CNN approach has a potential for further exploring the TBI-related pathologies.
Collapse
|
8
|
Identification of Turtle-Shell Growth Year Using Hyperspectral Imaging Combined with an Enhanced Spatial-Spectral Attention 3DCNN and a Transformer. Molecules 2023; 28:6427. [PMID: 37687257 PMCID: PMC10490299 DOI: 10.3390/molecules28176427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 08/31/2023] [Accepted: 09/01/2023] [Indexed: 09/10/2023] Open
Abstract
Turtle shell (Chinemys reecesii) is a prized traditional Chinese dietary therapy, and the growth year of turtle shell has a significant impact on its quality attributes. In this study, a hyperspectral imaging (HSI) technique combined with a proposed deep learning (DL) network algorithm was investigated for the objective determination of the growth year of turtle shells. The acquisition of hyperspectral images was carried out in the near-infrared range (948.72-2512.97 nm) from samples spanning five different growth years. To fully exploit the spatial and spectral information while reducing redundancy in hyperspectral data simultaneously, three modules were developed. First, the spectral-spatial attention (SSA) module was developed to better protect the spectral correlation among spectral bands and capture fine-grained spatial information of hyperspectral images. Second, the 3D convolutional neural network (CNN), more suitable for the extracted 3D feature map, was employed to facilitate the joint spatial-spectral feature representation. Thirdly, to overcome the constraints of convolution kernels as well as better capture long-range correlation between spectral bands, the transformer encoder (TE) module was further designed. These modules were harmoniously orchestrated, driven by the need to effectively leverage both spatial and spectral information within hyperspectral data. They collectively enhance the model's capacity to extract joint spatial and spectral features to discern growth years accurately. Experimental studies demonstrated that the proposed model (named SSA-3DTE) achieved superior classification accuracy, with 98.94% on average for five-category classification, outperforming traditional machine learning methods using only spectral information and representative deep learning methods. Also, ablation experiments confirmed the effectiveness of each module to improve performance. The encouraging results of this study revealed the potentiality of HSI combined with the DL algorithm as an efficient and non-destructive method for the quality control of turtle shells.
Collapse
|
9
|
A deep-learning assisted bioluminescence tomography method to enable radiation targeting in rat glioblastoma. Phys Med Biol 2023. [PMID: 37385265 DOI: 10.1088/1361-6560/ace308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
OBJECTIVE A novel solution is required for accurate 3D Bioluminescence Tomography (BLT) based glioblastoma (GBM) targeting. The provided solution should be computationally efficient to support real-time treatment planning, thus reducing the X-ray imaging dose imposed by high-resolution micro cone-beam CT.
Approach: A novel deep-learning approach is developed to enable BLT-based tumor targeting and treatment planning for orthotopic rat GBM models. The proposed framework is trained and validated on a set of realistic Monte Carlo simulations. Finally, the trained deep learning model is tested on a limited set of BLI measurements of real rat GBM models.
Significance: Bioluminescence Imaging (BLI) is a 2D non-invasive optical imaging modality geared toward preclinical cancer research. It can be used to monitor tumor growth in small animal tumor models effectively and without radiation burden. However, the current state-of-the-art does not allow accurate radiation treatment planning using BLI, hence limiting BLI's value in preclinical radiobiology research. 
Results: The proposed solution can achieve sub-millimeter targeting accuracy on the simulated dataset, with a median dice similarity coefficient (DSC) of 61%. The provided BLT-based planning volume achieves a median encapsulation of more than 97% of the tumor while keeping the median geometrical brain coverage below 4.2%. For the real BLI measurements, the proposed solution provided median geometrical tumor coverage of 95% and a median DSC of 42%. Dose planning using a dedicated small animal treatment planning system indicated good BLT-based treatment planning accuracy compared to ground-truth CT-based planning, where dose-volume metrics for the tumor fall within the limit of agreement for more than 95% of cases.
Conclusion: The combination of flexibility, accuracy, and speed of the deep learning solutions make them a viable option for the BLT reconstruction problem and can provide BLT-based tumor targeting for the rat GBM models.
Collapse
|
10
|
Automatic identification of schizophrenia based on EEG signals using dynamic functional connectivity analysis and 3D convolutional neural network. Comput Biol Med 2023; 160:107022. [PMID: 37187135 DOI: 10.1016/j.compbiomed.2023.107022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/21/2023] [Accepted: 05/09/2023] [Indexed: 05/17/2023]
Abstract
Schizophrenia (ScZ) is a devastating mental disorder of the human brain that causes a serious impact of emotional inclinations, quality of personal and social life and healthcare systems. In recent years, deep learning methods with connectivity analysis only very recently focused into fMRI data. To explore this kind of research into electroencephalogram (EEG) signal, this paper investigates the identification of ScZ EEG signals using dynamic functional connectivity analysis and deep learning methods. A time-frequency domain functional connectivity analysis through cross mutual information algorithm is proposed to extract the features in alpha band (8-12 Hz) of each subject. A 3D convolutional neural network technique was applied to classify the ScZ subjects and health control (HC) subjects. The LMSU public ScZ EEG dataset is employed to evaluate the proposed method, and a 97.74 ± 1.15% accuracy, 96.91 ± 2.76% sensitivity and 98.53 ± 1.97% specificity results were achieved in this study. In addition, we also found not only the default mode network region but also the connectivity between temporal lobe and posterior temporal lobe in both right and left side have significant difference between the ScZ and HC subjects.
Collapse
|
11
|
Multi-Modal Feature Fusion-Based Multi-Branch Classification Network for Pulmonary Nodule Malignancy Suspiciousness Diagnosis. J Digit Imaging 2023; 36:617-626. [PMID: 36478311 PMCID: PMC10039149 DOI: 10.1007/s10278-022-00747-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 09/28/2022] [Accepted: 11/27/2022] [Indexed: 12/13/2022] Open
Abstract
Detecting and identifying malignant nodules on chest computed tomography (CT) plays an important role in the early diagnosis and timely treatment of lung cancer, which can greatly reduce the number of deaths worldwide. In view of the existing methods in pulmonary nodule diagnosis, the importance of clinical radiological structured data (laboratory examination, radiological data) is ignored for the accuracy judgment of patients' condition. Hence, a multi-modal fusion multi-branch classification network is constructed to detect and classify pulmonary nodules in this work: (1) Radiological data of pulmonary nodules are used to construct structured features of length 9. (2) A multi-branch fusion-based effective attention mechanism network is designed for 3D CT Patch unstructured data, which uses 3D ECA-ResNet to dynamically adjust the extracted features. In addition, feature maps with different receptive fields from multi-layer are fully fused to obtain representative multi-scale unstructured features. (3) Multi-modal feature fusion of structured data and unstructured data is performed to distinguish benign and malignant nodules. Numerous experimental results show that this advanced network can effectively classify the benign and malignant pulmonary nodules for clinical diagnosis, which achieves the highest accuracy (94.89%), sensitivity (94.91%), and F1-score (94.65%) and lowest false positive rate (5.55%).
Collapse
|
12
|
MFCNet: A multi-modal fusion and calibration networks for 3D pancreas tumor segmentation on PET-CT images. Comput Biol Med 2023; 155:106657. [PMID: 36791551 DOI: 10.1016/j.compbiomed.2023.106657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 01/29/2023] [Accepted: 02/09/2023] [Indexed: 02/12/2023]
Abstract
In clinical diagnosis, positron emission tomography and computed tomography (PET-CT) images containing complementary information are fused. Tumor segmentation based on multi-modal PET-CT images is an important part of clinical diagnosis and treatment. However, the existing current PET-CT tumor segmentation methods mainly focus on positron emission tomography (PET) and computed tomography (CT) feature fusion, which weakens the specificity of the modality. In addition, the information interaction between different modal images is usually completed by simple addition or concatenation operations, but this has the disadvantage of introducing irrelevant information during the multi-modal semantic feature fusion, so effective features cannot be highlighted. To overcome this problem, this paper propose a novel Multi-modal Fusion and Calibration Networks (MFCNet) for tumor segmentation based on three-dimensional PET-CT images. First, a Multi-modal Fusion Down-sampling Block (MFDB) with a residual structure is developed. The proposed MFDB can fuse complementary features of multi-modal images while retaining the unique features of different modal images. Second, a Multi-modal Mutual Calibration Block (MMCB) based on the inception structure is designed. The MMCB can guide the network to focus on a tumor region by combining different branch decoding features using the attention mechanism and extracting multi-scale pathological features using a convolution kernel of different sizes. The proposed MFCNet is verified on both the public dataset (Head and Neck cancer) and the in-house dataset (pancreas cancer). The experimental results indicate that on the public and in-house datasets, the average Dice values of the proposed multi-modal segmentation network are 74.14% and 76.20%, while the average Hausdorff distances are 6.41 and 6.84, respectively. In addition, the experimental results show that the proposed MFCNet outperforms the state-of-the-art methods on the two datasets.
Collapse
|
13
|
High precision tracking analysis of cell position and motion fields using 3D U-net network models. Comput Biol Med 2023; 154:106577. [PMID: 36753978 DOI: 10.1016/j.compbiomed.2023.106577] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 01/09/2023] [Accepted: 01/22/2023] [Indexed: 01/27/2023]
Abstract
Cells are the basic units of biological organization, and the quantitative analysis of cellular states is an important topic in medicine and is valuable in revealing the complex mechanisms of microscopic world organisms. In order to better understand cell cycle changes as well as drug actions, we need to track cell migration and division. In this paper, we propose a novel engineering model for tracking cells using cell position and motion fields (CPMF). The training sample does not need to be manually annotated, and we modify and edit it against the ground truth using auxiliary tools. The core idea of the project is to combine detection and correlation, and the cell sequence samples are trained by a U-Net network model composed of 3D CNNs, which can track the migration, division, and entry and exit of cells in the field of view with high accuracy in all directions. The average detection accuracy of the cell coordinates is 98.38% and the average tracking accuracy is 98.70%.
Collapse
|
14
|
Application of high resolution computed tomography image assisted classification model of middle ear diseases based on 3D-convolutional neural network. ZHONG NAN DA XUE XUE BAO. YI XUE BAN = JOURNAL OF CENTRAL SOUTH UNIVERSITY. MEDICAL SCIENCES 2022; 47:1037-1048. [PMID: 36097771 PMCID: PMC10950109 DOI: 10.11817/j.issn.1672-7347.2022.210704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVES Chronic suppurative otitis media (CSOM) and middle ear cholesteatoma (MEC) are the 2 most common chronic middle ear diseases. In the process of diagnosis and treatment, the 2 diseases are prone to misdiagnosis and missed diagnosis due to their similar clinical manifestations. High resolution computed tomography (HRCT) can clearly display the fine anatomical structure of the temporal bone, accurately reflect the middle ear lesions and the extent of the lesions, and has advantages in the differential diagnosis of chronic middle ear diseases. This study aims to develop a deep learning model for automatic information extraction and classification diagnosis of chronic middle ear diseases based on temporal bone HRCT image data to improve the classification and diagnosis efficiency of chronic middle ear diseases in clinical practice and reduce the occurrence of missed diagnosis and misdiagnosis. METHODS The clinical records and temporal bone HRCT imaging data for patients with chronic middle ear diseases hospitalized in the Department of Otorhinolaryngology, Xiangya Hospital from January 2018 to October 2020 were retrospectively collected. The patient's medical records were independently reviewed by 2 experienced otorhinolaryngologist and the final diagnosis was reached a consensus. A total of 499 patients (998 ears) were enrolled in this study. The 998 ears were divided into 3 groups: an MEC group (108 ears), a CSOM group (622 ears), and a normal group (268 ears). The Gaussian noise with different variances was used to amplify the samples of the dataset to offset the imbalance in the number of samples between groups. The sample size of the amplified experimental dataset was 1 806 ears. In the study, 75% (1 355) samples were randomly selected for training, 10% (180) samples for validation, and the remaining 15% (271) samples for testing and evaluating the model performance. The overall design for the model was a serial structure, and the deep learning model with 3 different functions was set up. The first model was the regional recommendation network algorithm, which searched the middle ear image from the whole HRCT image, and then cut and saved the image. The second model was image contrast convolutional neural network (CNN) based on twin network structure, which searched the images matching the key layers of HRCT images from the cut images, and constructed 3D data blocks. The third model was based on 3D-CNN operation, which was used for the final classification and diagnosis of the 3D data block construction, and gave the final prediction probability. RESULTS The special level search network based on twin network structure showed an average AUC of 0.939 on 10 special levels. The overall accuracy of the classification network based on 3D-CNN was 96.5%, the overall recall rate was 96.4%, and the average AUC under the 3 classifications was 0.983. The recall rates of CSOM cases and MEC cases were 93.7% and 97.4%, respectively. In the subsequent comparison experiments, the average accuracy of some classical CNN was 79.3%, and the average recall rate was 87.6%. The precision rate and the recall rate of the deep learning network constructed in this study were about 17.2% and 8.8% higher than those of the common CNN. CONCLUSIONS The deep learning network model proposed in this study can automatically extract 3D data blocks containing middle ear features from the HRCT image data of patients' temporal bone, which can reduce the overall size of the data while preserve the relationship between corresponding images, and further use 3D-CNN for classification and diagnosis of CSOM and MEC. The design of this model is well fitting to the continuous characteristics of HRCT data, and the experimental results show high precision and adaptability, which is better than the current common CNN methods.
Collapse
|
15
|
Air Pollution Detection Using a Novel Snap-Shot Hyperspectral Imaging Technique. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22166231. [PMID: 36015992 PMCID: PMC9416790 DOI: 10.3390/s22166231] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 08/11/2022] [Accepted: 08/16/2022] [Indexed: 05/04/2023]
Abstract
Air pollution has emerged as a global problem in recent years. Particularly, particulate matter (PM2.5) with a diameter of less than 2.5 μm can move through the air and transfer dangerous compounds to the lungs through human breathing, thereby creating major health issues. This research proposes a large-scale, low-cost solution for detecting air pollution by combining hyperspectral imaging (HSI) technology and deep learning techniques. By modeling the visible-light HSI technology of the aerial camera, the image acquired by the drone camera is endowed with hyperspectral information. Two methods are used for the classification of the images. That is, 3D Convolutional Neural Network Auto Encoder and principal components analysis (PCA) are paired with VGG-16 (Visual Geometry Group) to find the optical properties of air pollution. The images are classified into good, moderate, and severe based on the concentration of PM2.5 particles in the images. The results suggest that the PCA + VGG-16 has the highest average classification accuracy of 85.93%.
Collapse
|
16
|
Depth Estimation for Integral Imaging Microscopy Using a 3D-2D CNN with a Weighted Median Filter. SENSORS 2022; 22:s22145288. [PMID: 35890968 PMCID: PMC9316143 DOI: 10.3390/s22145288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/07/2022] [Accepted: 07/13/2022] [Indexed: 11/16/2022]
Abstract
This study proposes a robust depth map framework based on a convolutional neural network (CNN) to calculate disparities using multi-direction epipolar plane images (EPIs). A combination of three-dimensional (3D) and two-dimensional (2D) CNN-based deep learning networks is used to extract the features from each input stream separately. The 3D convolutional blocks are adapted according to the disparity of different directions of epipolar images, and 2D-CNNs are employed to minimize data loss. Finally, the multi-stream networks are merged to restore the depth information. A fully convolutional approach is scalable, which can handle any size of input and is less prone to overfitting. However, there is some noise in the direction of the edge. A weighted median filtering (WMF) is used to acquire the boundary information and improve the accuracy of the results to overcome this issue. Experimental results indicate that the suggested deep learning network architecture outperforms other architectures in terms of depth estimation accuracy.
Collapse
|
17
|
Generalizability assessment of COVID-19 3D CT data for deep learning-based disease detection. Comput Biol Med 2022; 145:105464. [PMID: 35390746 PMCID: PMC8971071 DOI: 10.1016/j.compbiomed.2022.105464] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Revised: 03/25/2022] [Accepted: 03/25/2022] [Indexed: 12/16/2022]
Abstract
BACKGROUND Artificial intelligence technologies in classification/detection of COVID-19 positive cases suffer from generalizability. Moreover, accessing and preparing another large dataset is not always feasible and time-consuming. Several studies have combined smaller COVID-19 CT datasets into "supersets" to maximize the number of training samples. This study aims to assess generalizability by splitting datasets into different portions based on 3D CT images using deep learning. METHOD Two large datasets, including 1110 3D CT images, were split into five segments of 20% each. Each dataset's first 20% segment was separated as a holdout test set. 3D-CNN training was performed with the remaining 80% from each dataset. Two small external datasets were also used to independently evaluate the trained models. RESULTS The total combination of 80% of each dataset has an accuracy of 91% on Iranmehr and 83% on Moscow holdout test datasets. Results indicated that 80% of the primary datasets are adequate for fully training a model. The additional fine-tuning using 40% of a secondary dataset helps the model generalize to a third, unseen dataset. The highest accuracy achieved through transfer learning was 85% on LDCT dataset and 83% on Iranmehr holdout test sets when retrained on 80% of Iranmehr dataset. CONCLUSION While the total combination of both datasets produced the best results, different combinations and transfer learning still produced generalizable results. Adopting the proposed methodology may help to obtain satisfactory results in the case of limited external datasets.
Collapse
|
18
|
A Hyperspectral Data 3D Convolutional Neural Network Classification Model for Diagnosis of Gray Mold Disease in Strawberry Leaves. FRONTIERS IN PLANT SCIENCE 2022; 13:837020. [PMID: 35360322 PMCID: PMC8963811 DOI: 10.3389/fpls.2022.837020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Gray mold disease is one of the most frequently occurring diseases in strawberries. Given that it spreads rapidly, rapid countermeasures are necessary through the development of early diagnosis technology. In this study, hyperspectral images of strawberry leaves that were inoculated with gray mold fungus to cause disease were taken; these images were classified into healthy and infected areas as seen by the naked eye. The areas where the infection spread after time elapsed were classified as the asymptomatic class. Square regions of interest (ROIs) with a dimensionality of 16 × 16 × 150 were acquired as training data, including infected, asymptomatic, and healthy areas. Then, 2D and 3D data were used in the development of a convolutional neural network (CNN) classification model. An effective wavelength analysis was performed before the development of the CNN model. Further, the classification model that was developed with 2D training data showed a classification accuracy of 0.74, while the model that used 3D data acquired an accuracy of 0.84; this indicated that the 3D data produced slightly better performance. When performing classification between healthy and asymptomatic areas for developing early diagnosis technology, the two CNN models showed a classification accuracy of 0.73 with regards to the asymptomatic ones. To increase accuracy in classifying asymptomatic areas, a model was developed by smoothing the spectrum data and expanding the first and second derivatives; the results showed that it was possible to increase the asymptomatic classification accuracy to 0.77 and reduce the misclassification of asymptomatic areas as healthy areas. Based on these results, it is concluded that the proposed 3D CNN classification model can be used as an early diagnosis sensor of gray mold diseases since it produces immediate on-site analysis results of hyperspectral images of leaves.
Collapse
|
19
|
Noncontact Sleep Monitoring With Infrared Video Data to Estimate Sleep Apnea Severity and Distinguish Between Positional and Nonpositional Sleep Apnea: Model Development and Experimental Validation. J Med Internet Res 2021; 23:e26524. [PMID: 34723817 PMCID: PMC8593819 DOI: 10.2196/26524] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/13/2021] [Accepted: 09/10/2021] [Indexed: 12/16/2022] Open
Abstract
Background Sleep apnea is a respiratory disorder characterized by frequent breathing cessation during sleep. Sleep apnea severity is determined by the apnea-hypopnea index (AHI), which is the hourly rate of respiratory events. In positional sleep apnea, the AHI is higher in the supine sleeping position than it is in other sleeping positions. Positional therapy is a behavioral strategy (eg, wearing an item to encourage sleeping toward the lateral position) to treat positional apnea. The gold standard of diagnosing sleep apnea and whether or not it is positional is polysomnography; however, this test is inconvenient, expensive, and has a long waiting list. Objective The objective of this study was to develop and evaluate a noncontact method to estimate sleep apnea severity and to distinguish positional versus nonpositional sleep apnea. Methods A noncontact deep-learning algorithm was developed to analyze infrared video of sleep for estimating AHI and to distinguish patients with positional vs nonpositional sleep apnea. Specifically, a 3D convolutional neural network (CNN) architecture was used to process movements extracted by optical flow to detect respiratory events. Positional sleep apnea patients were subsequently identified by combining the AHI information provided by the 3D-CNN model with the sleeping position (supine vs lateral) detected via a previously developed CNN model. Results The algorithm was validated on data of 41 participants, including 26 men and 15 women with a mean age of 53 (SD 13) years, BMI of 30 (SD 7), AHI of 27 (SD 31) events/hour, and sleep duration of 5 (SD 1) hours; 20 participants had positional sleep apnea, 15 participants had nonpositional sleep apnea, and the positional status could not be discriminated for the remaining 6 participants. AHI values estimated by the 3D-CNN model correlated strongly and significantly with the gold standard (Spearman correlation coefficient 0.79, P<.001). Individuals with positional sleep apnea (based on an AHI threshold of 15) were identified with 83% accuracy and an F1-score of 86%. Conclusions This study demonstrates the possibility of using a camera-based method for developing an accessible and easy-to-use device for screening sleep apnea at home, which can be provided in the form of a tablet or smartphone app.
Collapse
|
20
|
DW-UNet: Loss Balance under Local-Patch for 3D Infection Segmentation from COVID-19 CT Images. Diagnostics (Basel) 2021; 11:diagnostics11111942. [PMID: 34829289 PMCID: PMC8623821 DOI: 10.3390/diagnostics11111942] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/17/2021] [Accepted: 10/18/2021] [Indexed: 12/23/2022] Open
Abstract
(1) Background: COVID-19 has been global epidemic. This work aims to extract 3D infection from COVID-19 CT images; (2) Methods: Firstly, COVID-19 CT images are processed with lung region extraction and data enhancement. In this strategy, gradient changes of voxels in different directions respond to geometric characteristics. Due to the complexity of tubular tissues in lung region, they are clustered to the lung parenchyma center based on their filtered possibility. Thus, infection is improved after data enhancement. Then, deep weighted UNet is established to refining 3D infection texture, and weighted loss function is introduced. It changes cost calculation of different samples, causing target samples to dominate convergence direction. Finally, the trained network effectively extracts 3D infection from CT images by adjusting driving strategy of different samples. (3) Results: Using Accuracy, Precision, Recall and Coincidence rate, 20 subjects from a private dataset and eight subjects from Kaggle Competition COVID-19 CT dataset tested this method in hold-out validation framework. This work achieved good performance both in the private dataset (99.94–00.02%, 60.42–11.25%, 70.79–09.35% and 63.15–08.35%) and public dataset (99.73–00.12%, 77.02–06.06%, 41.23–08.61% and 52.50–08.18%). We also applied some extra indicators to test data augmentation and different models. The statistical tests have verified the significant difference of different models. (4) Conclusions: This study provides a COVID-19 infection segmentation technology, which provides an important prerequisite for the quantitative analysis of COVID-19 CT images.
Collapse
|
21
|
3D multi-scale, multi-task, and multi-label deep learning for prediction of lymph node metastasis in T1 lung adenocarcinoma patients' CT images. Comput Med Imaging Graph 2021; 93:101987. [PMID: 34610501 DOI: 10.1016/j.compmedimag.2021.101987] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 08/24/2021] [Accepted: 09/09/2021] [Indexed: 11/24/2022]
Abstract
The diagnosis of preoperative lymph node (LN) metastasis is crucial to evaluate possible therapy options for T1 lung adenocarcinoma patients. Radiologists preoperatively diagnose LN metastasis by evaluating signs related to LN metastasis, like spiculation or lobulation of pulmonary nodules in CT images. However, this type of evaluation is subjective and time-consuming, which may result in poor consistency and low efficiency of diagnoses. In this study, a 3D Multi-scale, Multi-task, and Multi-label classification network (3M-CN) was proposed to predict LN metastasis, as well as evaluate multiple related signs of pulmonary nodules in order to improve the accuracy of LN metastasis prediction. The following key approaches were adapted for this method. First, a multi-scale feature fusion module was proposed to aggregate the features from different levels for which different labels be best modeled at different levels; second, an auxiliary segmentation task was applied to force the model to focus more on the nodule region and less on surrounding unrelated structures; and third, a cross-modal integration module called the refine layer was designed to integrate the related risk factors into the model to further improve its confidence level. The 3M-CN was trained using data from 401 cases and then validated on both internal and external datasets, which consisted of 100 cases and 53 cases, respectively. The proposed 3M-CN model was then compared with existing state-of-the-art methods for prediction of LN metastasis. The proposed model outperformed other methods, achieving the best performance with AUCs of 0.945 and 0.948 in the internal and external test datasets, respectively. The proposed model not only obtain strong generalization, but greatly enhance the interpretability of the deep learning model, increase doctors' confidence in the model results, conform to doctors' diagnostic process, and may also be transferable to the diagnosis of other diseases.
Collapse
|
22
|
An end-to-end 3D convolutional neural network for decoding attentive mental state. Neural Netw 2021; 144:129-137. [PMID: 34492547 DOI: 10.1016/j.neunet.2021.08.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 08/01/2021] [Accepted: 08/12/2021] [Indexed: 11/26/2022]
Abstract
The detection of attentive mental state plays an essential role in the neurofeedback process and the treatment of Attention Deficit and Hyperactivity Disorder (ADHD). However, the performance of the detection methods is still not satisfactory. One of the challenges is to find a proper representation for the electroencephalogram (EEG) data, which could preserve the temporal information and maintain the spatial topological characteristics. Inspired by the deep learning (DL) methods in the research of brain-computer interface (BCI) field, a 3D representation of EEG signal was introduced into attention detection task, and a 3D convolutional neural network model with cascade and parallel convolution operations was proposed. The model utilized three cascade blocks, each consisting of two parallel 3D convolution branches, to simultaneously extract the multi-scale features. Evaluated on a public dataset containing twenty-six subjects, the proposed model achieved better performance compared with the baseline methods under the intra-subject, inter-subject and subject-adaptive classification scenarios. This study demonstrated the promising potential of the 3D CNN model for detecting attentive mental state.
Collapse
|
23
|
COVID-19 identification from volumetric chest CT scans using a progressively resized 3D-CNN incorporating segmentation, augmentation, and class-rebalancing. INFORMATICS IN MEDICINE UNLOCKED 2021; 26:100709. [PMID: 34642640 PMCID: PMC8494187 DOI: 10.1016/j.imu.2021.100709] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 08/11/2021] [Accepted: 08/18/2021] [Indexed: 12/18/2022] Open
Abstract
The novel COVID-19 is a global pandemic disease overgrowing worldwide. Computer-aided screening tools with greater sensitivity are imperative for disease diagnosis and prognosis as early as possible. It also can be a helpful tool in triage for testing and clinical supervision of COVID-19 patients. However, designing such an automated tool from non-invasive radiographic images is challenging as many manually annotated datasets are not publicly available yet, which is the essential core requirement of supervised learning schemes. This article proposes a 3D Convolutional Neural Network (CNN)-based classification approach considering both the inter-and intra-slice spatial voxel information. The proposed system is trained end-to-end on the 3D patches from the whole volumetric Computed Tomography (CT) images to enlarge the number of training samples, performing the ablation studies on patch size determination. We integrate progressive resizing, segmentation, augmentations, and class-rebalancing into our 3D network. The segmentation is a critical prerequisite step for COVID-19 diagnosis enabling the classifier to learn prominent lung features while excluding the outer lung regions of the CT scans. We evaluate all the extensive experiments on a publicly available dataset named MosMed, having binary- and multi-class chest CT image partitions. Our experimental results are very encouraging, yielding areas under the Receiver Operating Characteristics (ROC) curve of 0 . 914 ± 0 . 049 and 0 . 893 ± 0 . 035 for the binary- and multi-class tasks, respectively, applying 5-fold cross-validations. Our method's promising results delegate it as a favorable aiding tool for clinical practitioners and radiologists to assess COVID-19.
Collapse
|
24
|
[Development and application of computer vision-based acupuncture manipulation classification system]. ZHEN CI YAN JIU = ACUPUNCTURE RESEARCH 2021; 46:469-73. [PMID: 34190449 DOI: 10.13702/j.1000-0607.20210154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE To improve the accuracy of acupuncture manipulation modeling and inheritance, this article explores the feasibility of automatically classifying "twirling" and "lifting and thrusting", two basic acupuncture manipulations in science of acupuncture and moxibustion, with the computer vision technology. METHODS A hybrid deep learning network model was designed based on 3D convolutional neural network and long-short term memory neural network to extract the spatial-temporal features of video frame sequences, which were then input into the classifier for classification. RESULTS The model discriminated between "twirling" and "lifting and thrusting" manipulations in 200 videos, with the training and verification accuracy reaching up to 95.4% and 95.3%, respectively. CONCLUSION This computer vision-based acupuncture manipulation classification system provides an effective way for the data extraction and inheritance of acupuncture manipulations.
Collapse
|
25
|
Diagnosis of COVID-19 Pneumonia Based on Graph Convolutional Network. Front Med (Lausanne) 2021; 7:612962. [PMID: 33585511 PMCID: PMC7875085 DOI: 10.3389/fmed.2020.612962] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 12/11/2020] [Indexed: 01/12/2023] Open
Abstract
A three-dimensional (3D) deep learning method is proposed, which enables the rapid diagnosis of coronavirus disease 2019 (COVID-19) and thus significantly reduces the burden on radiologists and physicians. Inspired by the fact that the current chest computed tomography (CT) datasets are diversified in equipment types, we propose a COVID-19 graph in a graph convolutional network (GCN) to incorporate multiple datasets that differentiate the COVID-19 infected cases from normal controls. Specifically, we first apply a 3D convolutional neural network (3D-CNN) to extract image features from the initial 3D-CT images. In this part, a transfer learning method is proposed to improve the performance, which uses the task of predicting equipment type to initialize the parameters of the 3D-CNN structure. Second, we design a COVID-19 graph in GCN based on the extracted features. The graph divides all samples into several clusters, and samples with the same equipment type compose a cluster. Then we establish edge connections between samples in the same cluster. To compute accurate edge weights, we propose to combine the correlation distance of the extracted features and the score differences of subjects from the 3D-CNN structure. Lastly, by inputting the COVID-19 graph into GCN, we obtain the final diagnosis results. In experiments, the dataset contains 399 COVID-19 infected cases, and 400 normal controls from six equipment types. Experimental results show that the accuracy, sensitivity, and specificity of our method reach 98.5%, 99.9%, and 97%, respectively.
Collapse
|
26
|
Low Frequency Vibration Visual Monitoring System Based on Multi-Modal 3DCNN-ConvLSTM. SENSORS 2020; 20:s20205872. [PMID: 33080814 PMCID: PMC7589111 DOI: 10.3390/s20205872] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/13/2020] [Accepted: 10/15/2020] [Indexed: 12/04/2022]
Abstract
Low frequency vibration monitoring has significant implications on environmental safety and engineering practices. Vibration expressed by visual information should contain sufficient spatial information. RGB-D camera could record diverse spatial information of vibration in frame images. Deep learning can adaptively transform frame images into deep abstract features through nonlinear mapping, which is an effective method to improve the intelligence of vibration monitoring. In this paper, a multi-modal low frequency visual vibration monitoring system based on Kinect v2 and 3DCNN-ConvLSTM is proposed. Microsoft Kinect v2 collects RGB and depth video information of vibrating objects in unstable ambient light. The 3DCNN-ConvLSTM architecture can effectively learn the spatial-temporal characteristics of muti-frequency vibration. The short-term spatiotemporal feature of the collected vibration information is learned through 3D convolution networks and the long-term spatiotemporal feature is learned through convolutional LSTM. Multi-modal fusion of RGB and depth mode is used to further improve the monitoring accuracy to 93% in the low frequency vibration range of 0–10 Hz. The results show that the system can monitor low frequency vibration and meet the basic measurement requirements.
Collapse
|
27
|
Classifying Autism Spectrum Disorder Using the Temporal Statistics of Resting-State Functional MRI Data With 3D Convolutional Neural Networks. Front Psychiatry 2020; 11:440. [PMID: 32477198 PMCID: PMC7242627 DOI: 10.3389/fpsyt.2020.00440] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 04/28/2020] [Indexed: 11/13/2022] Open
Abstract
Resting-state functional magnetic resonance imaging (rs-fMRI) data are 4-dimensional volumes (3-space + 1-time) that have been posited to reflect the underlying mechanisms of information exchange between brain regions, thus making it an attractive modality to develop diagnostic biomarkers of brain dysfunction. The enormous success of deep learning in computer vision has sparked recent interest in applying deep learning in neuroimaging. But the dimensionality of rs-fMRI data is too high (~20 M), making it difficult to meaningfully process the data in its raw form for deep learning experiments. It is currently not clear how the data should be engineered to optimally extract the time information, and whether combining different representations of time could provide better results. In this paper, we explored various transformations that retain the full spatial resolution by summarizing the temporal dimension of the rs-fMRI data, therefore making it possible to train a full three-dimensional convolutional neural network (3D-CNN) even on a moderately sized [~2,000 from Autism Brain Imaging Data Exchange (ABIDE)-I and II] data set. These transformations summarize the activity in each voxel of the rs-fMRI or that of the voxel and its neighbors to a single number. For each brain volume, we calculated regional homogeneity, the amplitude of low-frequency fluctuations, the fractional amplitude of low-frequency fluctuations, degree centrality, eigenvector centrality, local functional connectivity density, entropy, voxel-mirrored homotopic connectivity, and auto-correlation lag. We trained the 3D-CNN on a publically available autism dataset to classify the rs-fMRI images as being from individuals with autism spectrum disorder (ASD) or from healthy controls (CON) at an individual level. We attained results competitive on this task for a combined ABIDE-I and II datasets of ~66%. When all summary measures were combined the result was still only as good as that of the best single measure which was regional homogeneity (ReHo). In addition, we also applied the support vector machine (SVM) algorithm on the same dataset and achieved comparable results, suggesting that 3D-CNNs could not learn additional information from these temporal transformations that were more useful to differentiate ASD from CON.
Collapse
|
28
|
Deep learning-based interpretation of basal/acetazolamide brain perfusion SPECT leveraging unstructured reading reports. Eur J Nucl Med Mol Imaging 2020; 47:2186-2196. [PMID: 31912255 DOI: 10.1007/s00259-019-04670-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 12/23/2019] [Indexed: 12/27/2022]
Abstract
PURPOSE Basal/acetazolamide brain perfusion single-photon emission computed tomography (SPECT) has been used to evaluate functional hemodynamics in patients with carotid artery stenosis. We aimed to develop a deep learning model as a support system for interpreting brain perfusion SPECT leveraging unstructured text reports. METHODS In total, 7345 basal/acetazolamide brain perfusion SPECT images and their text reports were retrospectively collected. A long short-term memory (LSTM) network was trained using 500 randomly selected text reports to predict manually labeled structured information, including abnormalities of basal perfusion and vascular reserve for each vascular territory. Using this trained LSTM model, we extracted structured information from the remaining 6845 text reports to develop a deep learning model for interpreting SPECT images. The model was based on a 3D convolutional neural network (CNN), and the performance was tested on the other 500 cases by measuring the area under the receiver-operating characteristic curve (AUC). We then applied the model to patients who underwent revascularization (n = 33) to compare the estimated output of the CNN model for pre- and post-revascularization SPECT and clinical outcomes. RESULTS The AUC of the LSTM model for extracting structured labels was 1.00 for basal perfusion and 0.99 for vascular reserve for all 9 brain regions. The AUC of the CNN model designed to identify abnormal perfusion was 0.83 for basal perfusion and 0.89 for vascular reserve. The output of the CNN model was significantly improved according to the revascularization in the target vascular territory, and its changes in brain territories were concordant with clinical outcomes. CONCLUSION We developed a deep learning model to support the interpretation of brain perfusion SPECT by converting unstructured text reports into structured labels. This model can be used as a support system not only to identify perfusion abnormalities but also to provide quantitative scores of abnormalities, particularly for patients who require revascularization.
Collapse
|
29
|
Automatic classification of lung nodule candidates based on a novel 3D convolution network and knowledge transferred from a 2D network. Med Phys 2019; 46:5499-5513. [PMID: 31621916 DOI: 10.1002/mp.13867] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 10/01/2019] [Accepted: 10/02/2019] [Indexed: 12/19/2022] Open
Abstract
OBJECTIVE In the automatic lung nodule detection system, the authenticity of a large number of nodule candidates needs to be judged, which is a classification task. However, the variable shapes and sizes of the lung nodules have posed a great challenge to the classification of candidates. To solve this problem, we propose a method for classifying nodule candidates through three-dimensional (3D) convolution neural network (ConvNet) model which is trained by transferring knowledge from a multiresolution two-dimensional (2D) ConvNet model. METHODS In this scheme, a novel 3D ConvNet model is preweighted with the weights of the trained 2D ConvNet model, and then the 3D ConvNet model is trained with 3D image volumes. In this way, the knowledge transfer method can make 3D network easier to converge and make full use of the spatial information of nodules with different sizes and shapes to improve the classification accuracy. RESULTS The experimental results on 551 065 pulmonary nodule candidates in the LUNA16 dataset show that our method gains a competitive average score in the false-positive reduction track in lung nodule detection, with the sensitivities of 0.619 and 0.642 at 0.125 and 0.25 FPs per scan, respectively. CONCLUSIONS The proposed method can maintain satisfactory classification accuracy even when the false-positive rate is extremely small in the face of nodules of different sizes and shapes. Moreover, as a transfer learning idea, the method to transfer knowledge from 2D ConvNet to 3D ConvNet is the first attempt to carry out full migration of parameters of various layers including convolution layers, full connection layers, and classifier between different dimensional models, which is more conducive to utilizing the existing 2D ConvNet resources and generalizing transfer learning schemes.
Collapse
|
30
|
Multiscale brain MRI super-resolution using deep 3D convolutional networks. Comput Med Imaging Graph 2019; 77:101647. [PMID: 31493703 DOI: 10.1016/j.compmedimag.2019.101647] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 06/18/2019] [Accepted: 08/01/2019] [Indexed: 10/26/2022]
Abstract
The purpose of super-resolution approaches is to overcome the hardware limitations and the clinical requirements of imaging procedures by reconstructing high-resolution images from low-resolution acquisitions using post-processing methods. Super-resolution techniques could have strong impacts on structural magnetic resonance imaging when focusing on cortical surface or fine-scale structure analysis for instance. In this paper, we study deep three-dimensional convolutional neural networks for the super-resolution of brain magnetic resonance imaging data. First, our work delves into the relevance of several factors in the performance of the purely convolutional neural network-based techniques for the monomodal super-resolution: optimization methods, weight initialization, network depth, residual learning, filter size in convolution layers, number of the filters, training patch size and number of training subjects. Second, our study also highlights that one single network can efficiently handle multiple arbitrary scaling factors based on a multiscale training approach. Third, we further extend our super-resolution networks to the multimodal super-resolution using intermodality priors. Fourth, we investigate the impact of transfer learning skills onto super-resolution performance in terms of generalization among different datasets. Lastly, the learnt models are used to enhance real clinical low-resolution images. Results tend to demonstrate the potential of deep neural networks with respect to practical medical image applications.
Collapse
|
31
|
Deep learning approaches using 2D and 3D convolutional neural networks for generating male pelvic synthetic computed tomography from magnetic resonance imaging. Med Phys 2019; 46:3788-3798. [PMID: 31220353 DOI: 10.1002/mp.13672] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 06/05/2019] [Accepted: 06/10/2019] [Indexed: 01/17/2023] Open
Abstract
PURPOSE The improved soft tissue contrast of magnetic resonance imaging (MRI) compared to computed tomography (CT) makes it a useful imaging modality for radiotherapy treatment planning. Even when MR images are acquired for treatment planning, the standard clinical practice currently also requires a CT for dose calculation and x-ray-based patient positioning. This increases workloads, introduces uncertainty due to the required inter-modality image registrations, and involves unnecessary irradiation. While it would be beneficial to use exclusively MR images, a method needs to be employed to estimate a synthetic CT (sCT) for generating electron density maps and patient positioning reference images. We investigated 2D and 3D convolutional neural networks (CNNs) to generate a male pelvic sCT using a T1-weighted MR image and compare their performance. METHODS A retrospective study was performed using CTs and T1-weighted MR images of 20 prostate cancer patients. CTs were deformably registered to MR images to create CT-MR pairs for training networks. The proposed 2D CNN, which contained 27 convolutional layers, was modified from the state-of-the-art 2D CNN to save computational memory and prepare for building the 3D CNN. The proposed 2D and 3D models were trained from scratch to map intensities of T1-weighted MR images to CT Hounsfield Unit (HU) values. Each sCT was generated in a fivefold cross-validation framework and compared with the corresponding deformed CT (dCT) using voxel-wise mean absolute error (MAE). The sCT geometric accuracy was evaluated by comparing bone regions, defined by thresholding at 150 HU in the dCTs and the sCTs, using dice similarity coefficient (DSC), recall, and precision. To evaluate sCT patient positioning accuracy, bone regions in dCTs and sCTs were rigidly registered to the corresponding cone-beam CTs. The resulting paired Euler transformation vectors were compared by calculating translation vector distances and absolute differences of Euler angles. Statistical tests were performed to evaluate the differences among the proposed models and Han's model. RESULTS Generating a pelvic sCT required approximately 5.5 s using the proposed models. The average MAEs within the body contour were 40.5 ± 5.4 HU (mean ± SD) and 37.6 ± 5.1 HU for the 2D and 3D CNNs, respectively. The average DSC, recall, and precision for the bone region (thresholding the CT at 150 HU) were 0.81 ± 0.04, 0.85 ± 0.04, and 0.77 ± 0.09 for the 2D CNN, and 0.82 ± 0.04, 0.84 ± 0.04, and 0.80 ± 0.08 for the 3D CNN, respectively. For both models, mean translation vector distances are less than 0.6 mm with mean absolute differences of Euler angles less than 0.5°. CONCLUSIONS The 2D and 3D CNNs generated accurate pelvic sCTs for the 20 patients using T1-weighted MR images. Statistical tests indicated that the proposed 3D model was able to generate sCTs with smaller MAE and higher bone region precision compared to 2D models. Results of patient alignment tests suggested that sCTs generated by the proposed CNNs can provide accurate patient positioning. The accuracy of the dose calculation using generated sCTs will be tested and compared for the proposed models in the future.
Collapse
|
32
|
Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network. SENSORS 2019; 19:s19112472. [PMID: 31151184 PMCID: PMC6603512 DOI: 10.3390/s19112472] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/23/2019] [Accepted: 05/24/2019] [Indexed: 11/18/2022]
Abstract
The worldwide utilization of surveillance cameras in smart cities has enabled researchers to analyze a gigantic volume of data to ensure automatic monitoring. An enhanced security system in smart cities, schools, hospitals, and other surveillance domains is mandatory for the detection of violent or abnormal activities to avoid any casualties which could cause social, economic, and ecological damages. Automatic detection of violence for quick actions is very significant and can efficiently assist the concerned departments. In this paper, we propose a triple-staged end-to-end deep learning violence detection framework. First, persons are detected in the surveillance video stream using a light-weight convolutional neural network (CNN) model to reduce and overcome the voluminous processing of useless frames. Second, a sequence of 16 frames with detected persons is passed to 3D CNN, where the spatiotemporal features of these sequences are extracted and fed to the Softmax classifier. Furthermore, we optimized the 3D CNN model using an open visual inference and neural networks optimization toolkit developed by Intel, which converts the trained model into intermediate representation and adjusts it for optimal execution at the end platform for the final prediction of violent activity. After detection of a violent activity, an alert is transmitted to the nearest police station or security department to take prompt preventive actions. We found that our proposed method outperforms the existing state-of-the-art methods for different benchmark datasets.
Collapse
|
33
|
Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs. Comput Biol Med 2018; 103:220-231. [PMID: 30390571 DOI: 10.1016/j.compbiomed.2018.10.011] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 10/11/2018] [Accepted: 10/11/2018] [Indexed: 12/17/2022]
Abstract
OBJECTIVE A novel computer-aided detection (CAD) scheme for lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy is proposed to assist radiologists by providing a second opinion on accurate lung nodule detection, which is a crucial step in early diagnosis of lung cancer. METHOD A 3D deep convolutional neural network (CNN) with multi-scale prediction was used to detect lung nodules after the lungs were segmented from chest CT scans, with a comprehensive method utilized. Compared with a 2D CNN, a 3D CNN can utilize richer spatial 3D contextual information and generate more discriminative features after being trained with 3D samples to fully represent lung nodules. Furthermore, a multi-scale lung nodule prediction strategy, including multi-scale cube prediction and cube clustering, is also proposed to detect extremely small nodules. RESULT The proposed method was evaluated on 888 thin-slice scans with 1186 nodules in the LUNA16 database. All results were obtained via 10-fold cross-validation. Three options of the proposed scheme are provided for selection according to the actual needs. The sensitivity of the proposed scheme with the primary option reached 87.94% and 92.93% at one and four false positives per scan, respectively. Meanwhile, the competition performance metric (CPM) score is very satisfying (0.7967). CONCLUSION The experimental results demonstrate the outstanding detection performance of the proposed nodule detection scheme. In addition, the proposed scheme can be extended to other medical image recognition fields.
Collapse
|
34
|
Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal 2016; 36:61-78. [PMID: 27865153 DOI: 10.1016/j.media.2016.10.004] [Citation(s) in RCA: 1292] [Impact Index Per Article: 161.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 09/09/2016] [Accepted: 10/12/2016] [Indexed: 12/13/2022]
Abstract
We propose a dual pathway, 11-layers deep, three-dimensional Convolutional Neural Network for the challenging task of brain lesion segmentation. The devised architecture is the result of an in-depth analysis of the limitations of current networks proposed for similar applications. To overcome the computational burden of processing 3D medical scans, we have devised an efficient and effective dense training scheme which joins the processing of adjacent image patches into one pass through the network while automatically adapting to the inherent class imbalance present in the data. Further, we analyze the development of deeper, thus more discriminative 3D CNNs. In order to incorporate both local and larger contextual information, we employ a dual pathway architecture that processes the input images at multiple scales simultaneously. For post-processing of the network's soft segmentation, we use a 3D fully connected Conditional Random Field which effectively removes false positives. Our pipeline is extensively evaluated on three challenging tasks of lesion segmentation in multi-channel MRI patient data with traumatic brain injuries, brain tumours, and ischemic stroke. We improve on the state-of-the-art for all three applications, with top ranking performance on the public benchmarks BRATS 2015 and ISLES 2015. Our method is computationally efficient, which allows its adoption in a variety of research and clinical settings. The source code of our implementation is made publicly available.
Collapse
|