1
|
Yang TH, Liao ZY, Yu YH, Hsia M. RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications. Comput Biol Chem 2023; 106:107929. [PMID: 37517206 DOI: 10.1016/j.compbiolchem.2023.107929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 04/19/2023] [Accepted: 07/14/2023] [Indexed: 08/01/2023]
Abstract
Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, Tainan 701, Taiwan; Medical Device Innovation Center, National Cheng Kung University, Tainan City 701, Taiwan.
| | - Zhan-Yi Liao
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| | - Yu-Huai Yu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| | - Min Hsia
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| |
Collapse
|
2
|
Dong J, Cheng G, Zhang Y, Peng C, Song Y, Tong R, Lin L, Chen YW. Tailored multi-organ segmentation with model adaptation and ensemble. Comput Biol Med 2023; 166:107467. [PMID: 37725849 DOI: 10.1016/j.compbiomed.2023.107467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/10/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023]
Abstract
Multi-organ segmentation, which identifies and separates different organs in medical images, is a fundamental task in medical image analysis. Recently, the immense success of deep learning motivated its wide adoption in multi-organ segmentation tasks. However, due to expensive labor costs and expertise, the availability of multi-organ annotations is usually limited and hence poses a challenge in obtaining sufficient training data for deep learning-based methods. In this paper, we aim to address this issue by combining off-the-shelf single-organ segmentation models to develop a multi-organ segmentation model on the target dataset, which helps get rid of the dependence on annotated data for multi-organ segmentation. To this end, we propose a novel dual-stage method that consists of a Model Adaptation stage and a Model Ensemble stage. The first stage enhances the generalization of each off-the-shelf segmentation model on the target domain, while the second stage distills and integrates knowledge from multiple adapted single-organ segmentation models. Extensive experiments on four abdomen datasets demonstrate that our proposed method can effectively leverage off-the-shelf single-organ segmentation models to obtain a tailored model for multi-organ segmentation with high accuracy.
Collapse
Affiliation(s)
- Jiahua Dong
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
| | - Guohua Cheng
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
| | - Yue Zhang
- Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, 215163, China; School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230026, China.
| | - Chengtao Peng
- Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, 230026, China
| | - Yu Song
- Graduate School of Information Science and Engineering, Ritsumeikan University, Shiga, 525-8577, Japan
| | - Ruofeng Tong
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
| | - Lanfen Lin
- College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
| | - Yen-Wei Chen
- Graduate School of Information Science and Engineering, Ritsumeikan University, Shiga, 525-8577, Japan
| |
Collapse
|
3
|
Bashir J, Romshoo SA. Bias-corrected climate change projections over the Upper Indus Basin using a multi- model ensemble. Environ Sci Pollut Res Int 2023; 30:64517-64535. [PMID: 37071365 DOI: 10.1007/s11356-023-26898-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/05/2023] [Indexed: 05/11/2023]
Abstract
The study projects climate over the Upper Indus Basin (UIB), covering geographic areas in India, Pakistan, Afghanistan, and China, under the two Representative Concentration Pathways (RCPs), viz., RCP4.5 and RCP8.5 by the late twenty-first century using the best-fit climate model validated against the climate observations from eight meteorological stations. GFDL CM3 performed better than the other five evaluated climate models in simulating the climate of the UIB. The model bias was significantly reduced by the Aerts and Droogers statistical downscaling method, and the projections overall revealed a significant increase in temperature and a slight increase in precipitation across the UIB comprising of Jhelum, Chenab, and Indus sub-basins. According to RCP4.5 and RCP8.5, the temperature and precipitation in the Jhelum are projected to increase by 3 °C and 5.2 °C and 0.8% and 3.4% respectively by the late twenty-first century. The temperature and precipitation in the Chenab are projected to increase by 3.5 °C and 4.8 °C and 8% and 8.2% respectively by the late twenty-first century under the two scenarios. The temperature and precipitation in the Indus are projected to increase by 4.8 °C and 6.5 °C and 2.6% and 8.7% respectively by the late twenty-first century under RCP4.5 and RCP8.5 scenarios. The late twenty-first century projected climate would have significant impacts on various ecosystem services and products, irrigation and socio-hydrological regimes, and various dependent livelihoods. It is therefore hoped that the high-resolution climate projections would be useful for impact assessment studies to inform policymaking for climate action in the UIB.
Collapse
Affiliation(s)
- Jasia Bashir
- Department of Geoinformatics, University of Kashmir, Jammu and Kashmir, 190006, Hazratbal, Srinagar, India
| | - Shakil Ahmad Romshoo
- Department of Geoinformatics, University of Kashmir, Jammu and Kashmir, 190006, Hazratbal, Srinagar, India.
| |
Collapse
|
4
|
Yuan Q, Chen K, Yu Y, Le NQK, Chua MCH. Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding. Brief Bioinform 2023; 24:6987656. [PMID: 36642410 DOI: 10.1093/bib/bbac630] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/01/2022] [Accepted: 12/28/2022] [Indexed: 01/17/2023] Open
Abstract
Anticancer peptides (ACPs) are the types of peptides that have been demonstrated to have anticancer activities. Using ACPs to prevent cancer could be a viable alternative to conventional cancer treatments because they are safer and display higher selectivity. Due to ACP identification being highly lab-limited, expensive and lengthy, a computational method is proposed to predict ACPs from sequence information in this study. The process includes the input of the peptide sequences, feature extraction in terms of ordinal encoding with positional information and handcrafted features, and finally feature selection. The whole model comprises of two modules, including deep learning and machine learning algorithms. The deep learning module contained two channels: bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN). Light Gradient Boosting Machine (LightGBM) was used in the machine learning module. Finally, this study voted the three models' classification results for the three paths resulting in the model ensemble layer. This study provides insights into ACP prediction utilizing a novel method and presented a promising performance. It used a benchmark dataset for further exploration and improvement compared with previous studies. Our final model has an accuracy of 0.7895, sensitivity of 0.8153 and specificity of 0.7676, and it was increased by at least 2% compared with the state-of-the-art studies in all metrics. Hence, this paper presents a novel method that can potentially predict ACPs more effectively and efficiently. The work and source codes are made available to the community of researchers and developers at https://github.com/khanhlee/acp-ope/.
Collapse
Affiliation(s)
- Qitong Yuan
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Keyi Chen
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Yimin Yu
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing St, 106, Taipei, Taiwan.,Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing St, 106, Taipei, Taiwan.,Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing St, 110, Taipei, Taiwan
| | - Matthew Chin Heng Chua
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| |
Collapse
|
5
|
Tan Z, Shi J, Lv R, Li Q, Yang J, Ma Y, Li Y, Wu Y, Zhang R, Ma H, Li Y, Zhu L, Zhu L, Zhang X, Kong J, Yang W, Min L. Fast anther dehiscence status recognition system established by deep learning to screen heat tolerant cotton. Plant Methods 2022; 18:53. [PMID: 35449108 PMCID: PMC9026675 DOI: 10.1186/s13007-022-00884-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/01/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND From an economic perspective, cotton is one of the most important crops in the world. The fertility of male reproductive organs is a key determinant of cotton yield. Anther dehiscence or indehiscence directly determines the probability of fertilization in cotton. Thus, rapid and accurate identification of cotton anther dehiscence status is important for judging anther growth status and promoting genetic breeding research. The development of computer vision technology and the advent of big data have prompted the application of deep learning techniques to agricultural phenotype research. Therefore, two deep learning models (Faster R-CNN and YOLOv5) were proposed to detect the number and dehiscence status of anthers. RESULT The single-stage model based on YOLOv5 has higher recognition speed and the ability to deploy to the mobile end. Breeding researchers can apply this model to terminals to achieve a more intuitive understanding of cotton anther dehiscence status. Moreover, three improvement strategies are proposed for the Faster R-CNN model, where the improved model has higher detection accuracy than the YOLOv5 model. We have made three improvements to the Faster R-CNN model and after the ensemble of the three models and original Faster R-CNN model, R2 of "open" reaches to 0.8765, R2 of "close" reaches to 0.8539, R2 of "all" reaches to 0.8481, higher than the prediction results of either model alone, which are completely able to replace the manual counting results. We can use this model to quickly extract the dehiscence rate of cotton anthers under high temperature (HT) conditions. In addition, the percentage of dehiscent anthers of 30 randomly selected cotton varieties were observed from the cotton population under normal conditions and HT conditions through the ensemble of the Faster R-CNN model and manual counting. The results show that HT decreased the percentage of dehiscent anthers in different cotton lines, consistent with the manual method. CONCLUSIONS Deep learning technology have been applied to cotton anther dehiscence status recognition instead of manual methods for the first time to quickly screen HT-tolerant cotton varieties. Deep learning can help to explore the key genetic improvement genes in the future, promoting cotton breeding and improvement.
Collapse
Affiliation(s)
- Zhihao Tan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Jiawei Shi
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Rongjie Lv
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, 430075, China
| | - Jing Yang
- Institute of Economic Crops, Xinjiang Academy of Agricultural Sciences, Xinjiang, 830091, China
| | - Yizan Ma
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Yanlong Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Yuanlong Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Rui Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Huanhuan Ma
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Yawei Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Li Zhu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Longfu Zhu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Xianlong Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Jie Kong
- Institute of Economic Crops, Xinjiang Academy of Agricultural Sciences, Xinjiang, 830091, China.
| | - Wanneng Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China.
| | - Ling Min
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, Hubei, China.
| |
Collapse
|
6
|
Sun Z, Archibald AT. Multi-stage ensemble-learning-based model fusion for surface ozone simulations: A focus on CMIP6 models. Environ Sci Ecotechnol 2021; 8:100124. [PMID: 36156995 PMCID: PMC9488062 DOI: 10.1016/j.ese.2021.100124] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 09/01/2021] [Accepted: 09/09/2021] [Indexed: 05/31/2023]
Abstract
Accurately simulating the geographical distribution and temporal variability of global surface ozone has long been one of the principal components of chemistry-climate modelling. However, the simulation outcomes have been reported to vary significantly as a result of the complex mixture of uncertain factors that control the tropospheric ozone budget. Settling the cross-model discrepancies to achieve higher accuracy predictions of surface ozone is thus a task of priority, and methods that overcome structural biases in models going beyond naïve averaging of model simulations are urgently required. Building on the Coupled Model Intercomparison Project Phase 6 (CMIP6), we have transplanted a conventional ensemble learning approach, and also constructed an innovative 2-stage enhanced space-time Bayesian neural network to fuse an ensemble of 57 simulations together with a prescribed ozone dataset, both of which have realised outstanding performances (R2 > 0.95, RMSE < 2.12 ppbv). The conventional ensemble learning approach is computationally cheaper and results in higher overall performance, but at the expense of oceanic ozone being overestimated and the learning process being uninterpretable. The Bayesian approach performs better in spatial generalisation and enables perceivable interpretability, but induces heavier computational burdens. Both of these multi-stage machine learning-based approaches provide frameworks for improving the fidelity of composition-climate model outputs for uses in future impact studies.
Collapse
Affiliation(s)
- Zhe Sun
- Centre for Atmospheric Science, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
- Department of Earth Sciences, University of Cambridge, Cambridge, CB2 3EQ, UK
| | - Alexander T. Archibald
- Centre for Atmospheric Science, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
- National Centre for Atmospheric Science, Cambridge, CB2 1EW, UK
| |
Collapse
|
7
|
Chowell G, Luo R, Sun K, Roosa K, Tariq A, Viboud C. Real-time forecasting of epidemic trajectories using computational dynamic ensembles. Epidemics 2019; 30:100379. [PMID: 31887571 DOI: 10.1016/j.epidem.2019.100379] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 11/24/2019] [Accepted: 11/25/2019] [Indexed: 12/20/2022] Open
Abstract
Forecasting the trajectory of social dynamic processes, such as the spread of infectious diseases, poses significant challenges that call for methods that account for data and model uncertainty. Here we introduce an ensemble model for sequential forecasting that weights a set of plausible models and use a frequentist computational bootstrap approach to evaluate its uncertainty. We demonstrate the feasibility of our approach using simple dynamic differential-equation models and the trajectory of outbreak scenarios of the Ebola Forecasting Challenge. Specifically, we generate sequential short-term forecasts of epidemic outbreaks by combining phenomenological models that incorporate flexible epidemic growth scaling, namely the Generalized-Growth Model (GGM) and the Generalized Logistic Model (GLM). We rely on the root-mean-square error (RMSE) to quantify the quality of the models' fits during the calibration periods for weighting their contribution to the ensemble model while forecasting performance was evaluated using the RMSE of the forecasts. For a given forecasting horizon (1-4 weeks), we report the performance for each model as the percentage of the number of times each model outperforms the other models. The overall mean RMSE performance of the GLM and the GGM-GLM ensemble models outcompeted that of participant models of the Ebola Forecasting Challenge. We also found that the ensemble model provided more accurate forecasts with higher frequency than the GGM and GLM models, but its performance varied across forecasting horizons. For instance, across all of the Ebola Challenge Scenarios, the ensemble model outperformed the other models at horizons of 2 and 3 weeks while the GLM outperformed other models at horizons of 1 and 4 weeks.
Collapse
Affiliation(s)
- G Chowell
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA; Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA.
| | - R Luo
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - K Sun
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| | - K Roosa
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - A Tariq
- Department of Population Heath Sciences, School of Public Health, Georgia State University, Atlanta, GA, USA
| | - C Viboud
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
8
|
Trolle D, Nielsen A, Andersen HE, Thodsen H, Olesen JE, Børgesen CD, Refsgaard JC, Sonnenborg TO, Karlsson IB, Christensen JP, Markager S, Jeppesen E. Effects of changes in land use and climate on aquatic ecosystems: Coupling of models and decomposition of uncertainties. Sci Total Environ 2019; 657:627-633. [PMID: 30677929 DOI: 10.1016/j.scitotenv.2018.12.055] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 12/02/2018] [Accepted: 12/04/2018] [Indexed: 05/12/2023]
Abstract
To analyse the potential future ecological state of estuaries located in the temperate climate (here exemplified with the Odense Fjord estuary, Denmark), we combined end-of-the-century climate change projections from four different climate models, four contrasting land use scenarios ("Agriculture for nature", "Extensive agriculture", "High-tech agriculture" and "Market driven agriculture") and two different eco-hydrological models. By decomposing the variance of the model-simulated output from all scenario and model combinations, we identified the key sources of uncertainties of these future projections. There was generally a decline in the ecological state of the estuary in scenarios with a warmer climate. Strikingly, even the most nature-friendly land use scenario, where a proportion of the intensive agricultural area was converted to forest, may not be enough to counteract the negative effects of a future warmer climate on the ecological state of the estuary. The different land use scenarios were the most significant sources of uncertainty in the projections of future ecological state, followed, in order, by eco-hydrological models and climate models, albeit all three sources caused high variability in the simulated outputs. Therefore, when projecting the future state of aquatic ecosystems in a global warming context, one should at the very least consider to evaluate an ensemble of land use scenarios (nutrient loads) but ideally also include multiple eco-hydrological models and climate change projections. Our study may set precedence for future attempts to predict and quantify uncertainties of model and model input ensembles, as this will likely be key elements in future tools for decision-making processes.
Collapse
Affiliation(s)
- Dennis Trolle
- Aarhus University, Department of Bioscience, Silkeborg, Denmark; Sino-Danish Center for Education and Research (SDC), Aarhus, Denmark; Sino-Danish Center for Education and Research (SDC), Beijing, China.
| | - Anders Nielsen
- Aarhus University, Department of Bioscience, Silkeborg, Denmark
| | - Hans E Andersen
- Aarhus University, Department of Bioscience, Silkeborg, Denmark
| | - Hans Thodsen
- Aarhus University, Department of Bioscience, Silkeborg, Denmark
| | - Jørgen E Olesen
- Aarhus University, Department of Agroecology, Foulum, Denmark; Sino-Danish Center for Education and Research (SDC), Aarhus, Denmark; Geological Survey of Denmark and Greenland (GEUS), Copenhagen, Denmark
| | | | | | | | - Ida B Karlsson
- Geological Survey of Denmark and Greenland (GEUS), Copenhagen, Denmark
| | | | - Stiig Markager
- Aarhus University, Department of Bioscience, Roskilde, Denmark
| | - Erik Jeppesen
- Aarhus University, Department of Bioscience, Silkeborg, Denmark; Sino-Danish Center for Education and Research (SDC), Aarhus, Denmark; Sino-Danish Center for Education and Research (SDC), Beijing, China
| |
Collapse
|