1
|
Wang R, Mu Z, Wang J, Wang K, Liu H, Zhou Z, Jiao L. ASF-LKUNet: Adjacent-scale fusion U-Net with large kernel for multi-organ segmentation. Comput Biol Med 2024; 181:109050. [PMID: 39205343 DOI: 10.1016/j.compbiomed.2024.109050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 08/17/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024]
Abstract
In the multi-organ segmentation task of medical images, there are some challenging issues such as the complex background, blurred boundaries between organs, and the larger scale difference in volume. Due to the local receptive fields of conventional convolution operations, it is difficult to obtain desirable results by directly using them for multi-organ segmentation. While Transformer-based models have global information, there is a significant dependency on hardware because of the high computational demands. Meanwhile, the depthwise convolution with large kernel can capture global information and have less computational requirements. Therefore, to leverage the large receptive field and reduce model complexity, we propose a novel CNN-based approach, namely adjacent-scale fusion U-Net with large kernel (ASF-LKUNet) for multi-organ segmentation. We utilize a u-shaped encoder-decoder as the base architecture of ASF-LKUNet. In the encoder path, we design the large kernel residual block, which combines the large and small kernels and can simultaneously capture the global and local features. Furthermore, for the first time, we propose an adjacent-scale fusion and large kernel GRN channel attention that incorporates the low-level details with the high-level semantics by the adjacent-scale feature and then adaptively focuses on the more global and meaningful channel information. Extensive experiments and interpretability analysis are made on the Synapse multi-organ dataset (Synapse) and the ACDC cardiac multi-structure dataset (ACDC). Our proposed ASF-LKUNet achieves 88.41% and 89.45% DSC scores on the Synapse and ACDC datasets, respectively, with 17.96M parameters and 29.14 GFLOPs. These results show that our method achieves superior performance with favorable lower complexity against ten competing approaches.ASF-LKUNet is superior to various competing methods and has less model complexity. Code and the trained models have been released on GitHub.
Collapse
Affiliation(s)
- Rongfang Wang
- School of Artificial Intelligence, Xidian University, China.
| | - Zhaoshan Mu
- School of Artificial Intelligence, Xidian University, China
| | - Jing Wang
- Department of Radiation Oncology, UTSW, United States of America
| | - Kai Wang
- Department of Radiation Oncology, UMMC, United States of America
| | - Hui Liu
- Department of Biostatistics Data Science, KUMC, United States of America
| | - Zhiguo Zhou
- Department of Biostatistics Data Science, KUMC, United States of America
| | - Licheng Jiao
- School of Artificial Intelligence, Xidian University, China
| |
Collapse
|
2
|
Wang S, Liang S, Chang Q, Zhang L, Gong B, Bai Y, Zuo F, Wang Y, Xie X, Gu Y. STSN-Net: Simultaneous Tooth Segmentation and Numbering Method in Crowded Environments with Deep Learning. Diagnostics (Basel) 2024; 14:497. [PMID: 38472969 DOI: 10.3390/diagnostics14050497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/25/2024] [Accepted: 02/01/2024] [Indexed: 03/14/2024] Open
Abstract
Accurate tooth segmentation and numbering are the cornerstones of efficient automatic dental diagnosis and treatment. In this paper, a multitask learning architecture has been proposed for accurate tooth segmentation and numbering in panoramic X-ray images. A graph convolution network was applied for the automatic annotation of the target region, a modified convolutional neural network-based detection subnetwork (DSN) was used for tooth recognition and boundary regression, and an effective region segmentation subnetwork (RSSN) was used for region segmentation. The features extracted using RSSN and DSN were fused to optimize the quality of boundary regression, which provided impressive results for multiple evaluation metrics. Specifically, the proposed framework achieved a top F1 score of 0.9849, a top Dice metric score of 0.9629, and an mAP (IOU = 0.5) score of 0.9810. This framework holds great promise for enhancing the clinical efficiency of dentists in tooth segmentation and numbering tasks.
Collapse
Affiliation(s)
- Shaofeng Wang
- Department of Orthodontics, Beijing Stomatological Hospital, Capital Medical University, Beijing 100050, China
| | - Shuang Liang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Laboratory for Clinical Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamicationental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Qiao Chang
- Department of Orthodontics, Beijing Stomatological Hospital, Capital Medical University, Beijing 100050, China
| | - Li Zhang
- Department of Orthodontics, Beijing Stomatological Hospital, Capital Medical University, Beijing 100050, China
| | - Beiwen Gong
- Department of Orthodontics, Beijing Stomatological Hospital, Capital Medical University, Beijing 100050, China
| | - Yuxing Bai
- Department of Orthodontics, Beijing Stomatological Hospital, Capital Medical University, Beijing 100050, China
- Laboratory for Clinical Medicine, Capital Medical University, Beijing 100069, China
| | - Feifei Zuo
- LargeV Instrument Corp., Ltd., Beijing 100084, China
| | - Yajie Wang
- LargeV Instrument Corp., Ltd., Beijing 100084, China
| | - Xianju Xie
- Department of Orthodontics, Beijing Stomatological Hospital, Capital Medical University, Beijing 100050, China
- Laboratory for Clinical Medicine, Capital Medical University, Beijing 100069, China
| | - Yu Gu
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Laboratory for Clinical Medicine, Capital Medical University, Beijing 100069, China
| |
Collapse
|
3
|
Pu L, Leader JK, Ali A, Geng Z, Wilson D. Predicting left/right lung volumes, thoracic cavity volume, and heart volume from subject demographics to improve lung transplant. J Med Imaging (Bellingham) 2023; 10:051806. [PMID: 37077858 PMCID: PMC10108239 DOI: 10.1117/1.jmi.10.5.051806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 03/21/2023] [Indexed: 04/21/2023] Open
Abstract
Purpose Lung transplantation is the standard treatment for end-stage lung diseases. A crucial factor affecting its success is size matching between the donor's lungs and the recipient's thorax. Computed tomography (CT) scans can accurately determine recipient's lung size, but donor's lung size is often unknown due to the absence of medical images. We aim to predict donor's right/left/total lung volume, thoracic cavity, and heart volume from only subject demographics to improve the accuracy of size matching. Approach A cohort of 4610 subjects with chest CT scans and basic demographics (i.e., age, gender, race, smoking status, smoking history, weight, and height) was used in this study. The right and left lungs, thoracic cavity, and heart depicted on chest CT scans were automatically segmented using U-Net, and their volumes were computed. Eight machine learning models [i.e., random forest, multivariate linear regression, support vector machine, extreme gradient boosting (XGBoost), multilayer perceptron (MLP), decision tree, k -nearest neighbors, and Bayesian regression) were developed and used to predict the volume measures from subject demographics. The 10-fold cross-validation method was used to evaluate the performances of the prediction models. R -squared (R 2 ), mean absolute error (MAE), and mean absolute percentage error (MAPE) were used as performance metrics. Results The MLP model demonstrated the best performance for predicting the thoracic cavity volume (R 2 : 0.628, MAE: 0.736 L, MAPE: 10.9%), right lung volume (R 2 : 0.501, MAE: 0.383 L, MAPE: 13.9%), and left lung volume (R 2 : 0.507, MAE: 0.365 L, MAPE: 15.2%), and the XGBoost model demonstrated the best performance for predicting the total lung volume (R 2 : 0.514, MAE: 0.728 L, MAPE: 14.0%) and heart volume (R 2 : 0.430, MAE: 0.075 L, MAPE: 13.9%). Conclusions Our results demonstrate the feasibility of predicting lung, heart, and thoracic cavity volumes from subject demographics with superior performance compared with available studies in predicting lung volumes.
Collapse
Affiliation(s)
- Lucas Pu
- North Allegheny High School, Wexford, Pennsylvania, United States
| | - Joseph K Leader
- University of Pittsburgh School of Medicine, Department of Radiology, Pittsburgh, Pennsylvania, United States
| | - Alaa Ali
- University of Pittsburgh School of Medicine, Department of Radiology, Pittsburgh, Pennsylvania, United States
| | - Zihan Geng
- Carnegie Mellon University, Department of Statistics and Data Science, Pittsburgh, Pennsylvania, United States
| | - David Wilson
- University of Pittsburgh School of Medicine, Department of Medicine, Pittsburgh, Pennsylvania, United States
| |
Collapse
|
4
|
Jiang X, Hu Z, Wang S, Zhang Y. Deep Learning for Medical Image-Based Cancer Diagnosis. Cancers (Basel) 2023; 15:3608. [PMID: 37509272 PMCID: PMC10377683 DOI: 10.3390/cancers15143608] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/10/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires very high accuracy and timeliness as well as the inherent particularity and complexity of medical imaging. A comprehensive review of relevant studies is necessary to help readers better understand the current research status and ideas. (2) Methods: Five radiological images, including X-ray, ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and histopathological images, are reviewed in this paper. The basic architecture of deep learning and classical pretrained models are comprehensively reviewed. In particular, advanced neural networks emerging in recent years, including transfer learning, ensemble learning (EL), graph neural network, and vision transformer (ViT), are introduced. Five overfitting prevention methods are summarized: batch normalization, dropout, weight initialization, and data augmentation. The application of deep learning technology in medical image-based cancer analysis is sorted out. (3) Results: Deep learning has achieved great success in medical image-based cancer diagnosis, showing good results in image classification, image reconstruction, image detection, image segmentation, image registration, and image synthesis. However, the lack of high-quality labeled datasets limits the role of deep learning and faces challenges in rare cancer diagnosis, multi-modal image fusion, model explainability, and generalization. (4) Conclusions: There is a need for more public standard databases for cancer. The pre-training model based on deep neural networks has the potential to be improved, and special attention should be paid to the research of multimodal data fusion and supervised paradigm. Technologies such as ViT, ensemble learning, and few-shot learning will bring surprises to cancer diagnosis based on medical images.
Collapse
Grants
- RM32G0178B8 BBSRC
- MC_PC_17171 MRC, UK
- RP202G0230 Royal Society, UK
- AA/18/3/34220 BHF, UK
- RM60G0680 Hope Foundation for Cancer Research, UK
- P202PF11 GCRF, UK
- RP202G0289 Sino-UK Industrial Fund, UK
- P202ED10, P202RE969 LIAS, UK
- P202RE237 Data Science Enhancement Fund, UK
- 24NN201 Fight for Sight, UK
- OP202006 Sino-UK Education Fund, UK
- RM32G0178B8 BBSRC, UK
- 2023SJZD125 Major project of philosophy and social science research in colleges and universities in Jiangsu Province, China
Collapse
Affiliation(s)
- Xiaoyan Jiang
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China; (X.J.); (Z.H.)
| | - Zuojin Hu
- School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, China; (X.J.); (Z.H.)
| | - Shuihua Wang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
| | - Yudong Zhang
- School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
| |
Collapse
|
5
|
Huang KW, Yang YR, Huang ZH, Liu YY, Lee SH. Retinal Vascular Image Segmentation Using Improved UNet Based on Residual Module. Bioengineering (Basel) 2023; 10:722. [PMID: 37370653 DOI: 10.3390/bioengineering10060722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 06/01/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
In recent years, deep learning technology for clinical diagnosis has progressed considerably, and the value of medical imaging continues to increase. In the past, clinicians evaluated medical images according to their individual expertise. In contrast, the application of artificial intelligence technology for automatic analysis and diagnostic assistance to support clinicians in evaluating medical information more efficiently has become an important trend. In this study, we propose a machine learning architecture designed to segment images of retinal blood vessels based on an improved U-Net neural network model. The proposed model incorporates a residual module to extract features more effectively, and includes a full-scale skip connection to combine low level details with high-level features at different scales. The results of an experimental evaluation show that the model was able to segment images of retinal vessels accurately. The proposed method also outperformed several existing models on the benchmark datasets DRIVE and ROSE, including U-Net, ResUNet, U-Net3+, ResUNet++, and CaraNet.
Collapse
Affiliation(s)
- Ko-Wei Huang
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Yao-Ren Yang
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Zih-Hao Huang
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Yi-Yang Liu
- Department of Electrical Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
- Department of Urology, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung 83301, Taiwan
| | - Shih-Hsiung Lee
- Department of Intelligent Commerce, National Kaohsiung University of Science and Technology, Kaohsiung 82444, Taiwan
| |
Collapse
|
6
|
Gezer NS, Bandos AI, Beeche CA, Leader JK, Dhupar R, Pu J. CT-derived body composition associated with lung cancer recurrence after surgery. Lung Cancer 2023; 179:107189. [PMID: 37058786 PMCID: PMC10166196 DOI: 10.1016/j.lungcan.2023.107189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 03/24/2023] [Accepted: 04/07/2023] [Indexed: 04/16/2023]
Abstract
OBJECTIVES To evaluate the impact of body composition derived from computed tomography (CT) scans on postoperative lung cancer recurrence. METHODS We created a retrospective cohort of 363 lung cancer patients who underwent lung resections and had verified recurrence, death, or at least 5-year follow-up without either event. Five key body tissues and ten tumor features were automatically segmented and quantified based on preoperative whole-body CT scans (acquired as part of a PET-CT scan) and chest CT scans, respectively. Time-to-event analysis accounting for the competing event of death was performed to analyze the impact of body composition, tumor features, clinical information, and pathological features on lung cancer recurrence after surgery. The hazard ratio (HR) of normalized factors was used to assess individual significance univariately and in the combined models. The 5-fold cross-validated time-dependent receiver operating characteristics analysis, with an emphasis on the area under the 3-year ROC curve (AUC), was used to characterize the ability to predict lung cancer recurrence. RESULTS Body tissues that showed a standalone potential to predict lung cancer recurrence include visceral adipose tissue (VAT) volume (HR = 0.88, p = 0.047), subcutaneous adipose tissue (SAT) density (HR = 1.14, p = 0.034), inter-muscle adipose tissue (IMAT) volume (HR = 0.83, p = 0.002), muscle density (HR = 1.27, p < 0.001), and total fat volume (HR = 0.89, p = 0.050). The CT-derived muscular and tumor features significantly contributed to a model including clinicopathological factors, resulting in an AUC of 0.78 (95% CI: 0.75-0.83) to predict recurrence at 3 years. CONCLUSIONS Body composition features (e.g., muscle density, or muscle and inter-muscle adipose tissue volumes) can improve the prediction of recurrence when combined with clinicopathological factors.
Collapse
Affiliation(s)
- Naciye S Gezer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Andriy I Bandos
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Cameron A Beeche
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Joseph K Leader
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Rajeev Dhupar
- Department of Cardiothoracic Surgery, Division of Thoracic and Foregut Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; Surgical Services Division, Thoracic Surgery, VA Pittsburgh Healthcare System, Pittsburgh, PA 15213, USA.
| | - Jiantao Pu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
7
|
Pu L, Gezer NS, Ashraf SF, Ocak I, Dresser DE, Dhupar R. Automated segmentation of five different body tissues on computed tomography using deep learning. Med Phys 2023; 50:178-191. [PMID: 36008356 PMCID: PMC11186697 DOI: 10.1002/mp.15932] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 07/27/2020] [Accepted: 08/04/2022] [Indexed: 01/25/2023] Open
Abstract
PURPOSE To develop and validate a computer tool for automatic and simultaneous segmentation of five body tissues depicted on computed tomography (CT) scans: visceral adipose tissue (VAT), subcutaneous adipose tissue (SAT), intermuscular adipose tissue (IMAT), skeletal muscle (SM), and bone. METHODS A cohort of 100 CT scans acquired on different subjects were collected from The Cancer Imaging Archive-50 whole-body positron emission tomography-CTs, 25 chest, and 25 abdominal. Five different body tissues (i.e., VAT, SAT, IMAT, SM, and bone) were manually annotated. A training-while-annotating strategy was used to improve the annotation efficiency. The 10-fold cross-validation method was used to develop and validate the performance of several convolutional neural networks (CNNs), including UNet, Recurrent Residual UNet (R2Unet), and UNet++. A grid-based three-dimensional patch sampling operation was used to train the CNN models. The CNN models were also trained and tested separately for each body tissue to see if they could achieve a better performance than segmenting them jointly. The paired sample t-test was used to statistically assess the performance differences among the involved CNN models RESULTS: When segmenting the five body tissues simultaneously, the Dice coefficients ranged from 0.826 to 0.840 for VAT, from 0.901 to 0.908 for SAT, from 0.574 to 0.611 for IMAT, from 0.874 to 0.889 for SM, and from 0.870 to 0.884 for bone, which were significantly higher than the Dice coefficients when segmenting the body tissues separately (p < 0.05), namely, from 0.744 to 0.819 for VAT, from 0.856 to 0.896 for SAT, from 0.433 to 0.590 for IMAT, from 0.838 to 0.871 for SM, and from 0.803 to 0.870 for bone. CONCLUSION There were no significant differences among the CNN models in segmenting body tissues, but jointly segmenting body tissues achieved a better performance than segmenting them separately.
Collapse
Affiliation(s)
- Lucy Pu
- Department, of Cardiothoracic Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
- North Allegheny Senior High School, Wexford, USA
| | - Naciye S Gezer
- Department of Radiology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | | | - Iclal Ocak
- Department of Radiology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Daniel E. Dresser
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
| | - Rajeev Dhupar
- Department, of Cardiothoracic Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, USA
- Surgical Services Division, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
8
|
Beeche C, Gezer NS, Iyer K, Almetwali O, Yu J, Zhang Y, Dhupar R, Leader JK, Pu J. Assessing retinal vein occlusion based on color fundus photographs using neural understanding network (NUN). Med Phys 2023; 50:449-464. [PMID: 36184848 PMCID: PMC9868057 DOI: 10.1002/mp.16012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 01/26/2023] Open
Abstract
OBJECTIVE To develop and validate a novel deep learning architecture to classify retinal vein occlusion (RVO) on color fundus photographs (CFPs) and reveal the image features contributing to the classification. METHODS The neural understanding network (NUN) is formed by two components: (1) convolutional neural network (CNN)-based feature extraction and (2) graph neural networks (GNN)-based feature understanding. The CNN-based image features were transformed into a graph representation to encode and visualize long-range feature interactions to identify the image regions that significantly contributed to the classification decision. A total of 7062 CFPs were classified into three categories: (1) no vein occlusion ("normal"), (2) central RVO, and (3) branch RVO. The area under the receiver operative characteristic (ROC) curve (AUC) was used as the metric to assess the performance of the trained classification models. RESULTS The AUC, accuracy, sensitivity, and specificity for NUN to classify CFPs as normal, central occlusion, or branch occlusion were 0.975 (± 0.003), 0.911 (± 0.007), 0.983 (± 0.010), and 0.803 (± 0.005), respectively, which outperformed available classical CNN models. CONCLUSION The NUN architecture can provide a better classification performance and a straightforward visualization of the results compared to CNNs.
Collapse
Affiliation(s)
- Cameron Beeche
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Naciye S Gezer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kartik Iyer
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omar Almetwali
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Juezhao Yu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Yanchun Zhang
- Shaan’xi Eye Hospital, Xi’an, Shaanxi, 710004, China
| | - Rajeev Dhupar
- Department of Cardiothoracic Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Surgical Services Division, VA Pittsburgh Healthcare System, Pittsburgh, PA 15240
| | - Joseph K. Leader
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Jiantao Pu
- Department of Radiology, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
9
|
Image Colorization Algorithm Based on Deep Learning. Symmetry (Basel) 2022. [DOI: 10.3390/sym14112295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
As we know, image colorization is widely used in computer graphics and has become a research hotspot in the field of image processing. Current image colorization technology has the phenomenon of single coloring effect and unreal color, which is too complicated to be implemented and struggled to gain popularity. In this paper, a new method based on a convolution neural network is proposed to study the reasonable coloring of human images and ensures the realism of the coloring effect and the diversity of coloring at the same time. First, this paper selects about 5000 pictures of people and plants from the Imagenet dataset and makes a small dataset containing only people and backgrounds. Secondly, in order to obtain the image segmentation results, this paper improves the U-net network and carries out three times of down sampling and three times of up-sampling. Finally, we add the expanded convolution, and use the sigmoid activation function to replace the ReLU (The Rectified Linear Unit) activation function and put the BN (Batch Normalization) before the activation function. Experimental results show that our proposed image colorization algorithm based on the deep learning scheme can reduce the training time of the network and achieve higher quality segmentation results.
Collapse
|