1
|
Matsui H, Mochida K. Functional data analysis-based yield modeling in year-round crop cultivation. HORTICULTURE RESEARCH 2024; 11:uhae144. [PMID: 38988614 PMCID: PMC11234900 DOI: 10.1093/hr/uhae144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/16/2024] [Indexed: 07/12/2024]
Abstract
Crop yield prediction is essential for effective agricultural management. We introduce a methodology for modeling the relationship between environmental parameters and crop yield in longitudinal crop cultivation, exemplified by strawberry and tomato production based on year-round cultivation. Employing functional data analysis (FDA), we developed a model to assess the impact of these factors on crop yield, particularly in the face of environmental fluctuation. Specifically, we demonstrated that a varying-coefficient functional regression model (VCFRM) is utilized to analyze time-series data, enabling to visualize seasonal shifts and the dynamic interplay between environmental conditions such as solar radiation and temperature and crop yield. The interpretability of our FDA-based model yields insights for optimizing growth parameters, thereby augmenting resource efficiency and sustainability. Our results demonstrate the feasibility of VCFRM-based yield modeling, offering strategies for stable, efficient crop production, pivotal in addressing the challenges of climate adaptability in plant factory-based horticulture.
Collapse
Affiliation(s)
- Hidetoshi Matsui
- Faculty of Data Science, Shiga University, Banba, Hikone, Shiga 522-8522, Japan
| | - Keiichi Mochida
- RIKEN Center for Sustainable Resource Science, Yokohama 230-0045, Japan
- Kihara Institute for Biological Research, Yokohama City University, Yokohama 244-0813, Japan
- School of Information and Data Sciences, Nagasaki University, Nagasaki 852-8521 Japan
| |
Collapse
|
2
|
Nonparametric regression and classification with functional, categorical, and mixed covariates. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-022-00513-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
AbstractWe consider nonparametric prediction with multiple covariates, in particular categorical or functional predictors, or a mixture of both. The method proposed bases on an extension of the Nadaraya-Watson estimator where a kernel function is applied on a linear combination of distance measures each calculated on single covariates, with weights being estimated from the training data. The dependent variable can be categorical (binary or multi-class) or continuous, thus we consider both classification and regression problems. The methodology presented is illustrated and evaluated on artificial and real world data. Particularly it is observed that prediction accuracy can be increased, and irrelevant, noise variables can be identified/removed by ‘downgrading’ the corresponding distance measures in a completely data-driven way.
Collapse
|
3
|
Fukushima A, Sugimoto M, Hiwa S, Hiroyasu T. Bayesian approach for predicting responses to therapy from high-dimensional time-course gene expression profiles. BMC Bioinformatics 2021; 22:132. [PMID: 33736614 PMCID: PMC7977599 DOI: 10.1186/s12859-021-04052-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 02/28/2021] [Indexed: 12/14/2022] Open
Abstract
Background Historical and updated information provided by time-course data collected during an entire treatment period proves to be more useful than information provided by single-point data. Accurate predictions made using time-course data on multiple biomarkers that indicate a patient’s response to therapy contribute positively to the decision-making process associated with designing effective treatment programs for various diseases. Therefore, the development of prediction methods incorporating time-course data on multiple markers is necessary. Results We proposed new methods that may be used for prediction and gene selection via time-course gene expression profiles. Our prediction method consolidated multiple probabilities calculated using gene expression profiles collected over a series of time points to predict therapy response. Using two data sets collected from patients with hepatitis C virus (HCV) infection and multiple sclerosis (MS), we performed numerical experiments that predicted response to therapy and evaluated their accuracies. Our methods were more accurate than conventional methods and successfully selected genes, the functions of which were associated with the pathology of HCV infection and MS. Conclusions The proposed method accurately predicted response to therapy using data at multiple time points. It showed higher accuracies at early time points compared to those of conventional methods. Furthermore, this method successfully selected genes that were directly associated with diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04052-4.
Collapse
Affiliation(s)
- Arika Fukushima
- Graduate School of Life and Medical Sciences, Doshisha University, Kyotanabe-shi, Kyoto, 610-0321, Japan
| | - Masahiro Sugimoto
- Research and Development Center for Minimally Invasive Therapies, Institute of Medical Science, Tokyo Medical University, Shinjuku, Tokyo, 160-8402, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, 997-0052, Japan
| | - Satoru Hiwa
- Faculty of Life and Medical Sciences, Doshisha University, Kyotanabe-shi, Kyoto, 610-0321, Japan
| | - Tomoyuki Hiroyasu
- Faculty of Life and Medical Sciences, Doshisha University, Kyotanabe-shi, Kyoto, 610-0321, Japan.
| |
Collapse
|
4
|
Wu J, Gupta M, Hussein AI, Gerstenfeld L. Bayesian modeling of factorial time-course data with applications to a bone aging gene expression study. J Appl Stat 2020; 48:1730-1754. [PMID: 34295011 DOI: 10.1080/02664763.2020.1772733] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Many scientific studies, especially in the biomedical sciences, generate data measured simultaneously over a multitude of units, over a period of time, and under different conditions or combinations of factors. Often, an important question of interest asked relates to which units behave similarly under different conditions, but measuring the variation over time complicates the analysis significantly. In this article we address such a problem arising from a gene expression study relating to bone aging, and develop a Bayesian statistical method that can simultaneously detect and uncover signals on three levels within such data: factorial, longitudinal, and transcriptional. Our model framework considers both cluster and time-point-specific parameters and these parameters uniquely determine the shapes of the temporal gene expression profiles, allowing the discovery and characterization of latent gene clusters based on similar underlying biological mechanisms. Our methodology was successfully applied to discover transcriptional networks in a microarray data set comparing the transcriptomic changes that occurred during bone aging in male and female mice expressing one or both copies of the bromodomain (Brd2) gene, a transcriptional regulator which exhibits an age-dependent sex-linked bone loss phenotype.
Collapse
Affiliation(s)
- Joseph Wu
- Boston University School of Public Health, Boston, MA, U. S. A.,Pfizer, Inc., Groton, CT, U.S.A
| | | | | | | |
Collapse
|
5
|
Lin Y, Qian F, Shen L, Chen F, Chen J, Shen B. Computer-aided biomarker discovery for precision medicine: data resources, models and applications. Brief Bioinform 2020; 20:952-975. [PMID: 29194464 DOI: 10.1093/bib/bbx158] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 10/17/2017] [Indexed: 12/21/2022] Open
Abstract
Biomarkers are a class of measurable and evaluable indicators with the potential to predict disease initiation and progression. In contrast to disease-associated factors, biomarkers hold the promise to capture the changeable signatures of biological states. With methodological advances, computer-aided biomarker discovery has now become a burgeoning paradigm in the field of biomedical science. In recent years, the 'big data' term has accumulated for the systematical investigation of complex biological phenomena and promoted the flourishing of computational methods for systems-level biomarker screening. Compared with routine wet-lab experiments, bioinformatics approaches are more efficient to decode disease pathogenesis under a holistic framework, which is propitious to identify biomarkers ranging from single molecules to molecular networks for disease diagnosis, prognosis and therapy. In this review, the concept and characteristics of typical biomarker types, e.g. single molecular biomarkers, module/network biomarkers, cross-level biomarkers, etc., are explicated on the guidance of systems biology. Then, publicly available data resources together with some well-constructed biomarker databases and knowledge bases are introduced. Biomarker identification models using mathematical, network and machine learning theories are sequentially discussed. Based on network substructural and functional evidences, a novel bioinformatics model is particularly highlighted for microRNA biomarker discovery. This article aims to give deep insights into the advantages and challenges of current computational approaches for biomarker detection, and to light up the future wisdom toward precision medicine and nation-wide healthcare.
Collapse
Affiliation(s)
- Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Fuliang Qian
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Li Shen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Feifei Chen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| | - Jiajia Chen
- School of Chemistry, Biology and Material Engineering, Suzhou University of Science and Technology, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, Suzhou, Jiangsu, China
| |
Collapse
|
6
|
Spies D, Renz PF, Beyer TA, Ciaudo C. Comparative analysis of differential gene expression tools for RNA sequencing time course data. Brief Bioinform 2019; 20:288-298. [PMID: 29028903 PMCID: PMC6357553 DOI: 10.1093/bib/bbx115] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Indexed: 02/05/2023] Open
Abstract
RNA sequencing (RNA-seq) has become a standard procedure to investigate transcriptional changes between conditions and is routinely used in research and clinics. While standard differential expression (DE) analysis between two conditions has been extensively studied, and improved over the past decades, RNA-seq time course (TC) DE analysis algorithms are still in their early stages. In this study, we compare, for the first time, existing TC RNA-seq tools on an extensive simulation data set and validated the best performing tools on published data. Surprisingly, TC tools were outperformed by the classical pairwise comparison approach on short time series (<8 time points) in terms of overall performance and robustness to noise, mostly because of high number of false positives, with the exception of ImpulseDE2. Overlapping of candidate lists between tools improved this shortcoming, as the majority of false-positive, but not true-positive, candidates were unique for each method. On longer time series, pairwise approach was less efficient on the overall performance compared with splineTC and maSigPro, which did not identify any false-positive candidate.
Collapse
Affiliation(s)
- Daniel Spies
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Peter F Renz
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Tobias A Beyer
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| | - Constance Ciaudo
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| |
Collapse
|
7
|
Fukushima A, Sugimoto M, Hiwa S, Hiroyasu T. Elastic net-based prediction of IFN-β treatment response of patients with multiple sclerosis using time series microarray gene expression profiles. Sci Rep 2019; 9:1822. [PMID: 30755676 PMCID: PMC6372673 DOI: 10.1038/s41598-018-38441-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 12/14/2018] [Indexed: 01/08/2023] Open
Abstract
INF-β has been widely used to treat patients with multiple sclerosis (MS) in relapse. Accurate prediction of treatment response is important for effective personalization of treatment. Microarray data have been frequently used to discover new genes and to predict treatment responses. However, conventional analytical methods suffer from three difficulties: high-dimensionality of datasets; high degree of multi-collinearity; and achieving gene identification in time-course data. The use of Elastic net, a sparse modelling method, would decrease the first two issues; however, Elastic net is currently unable to solve these three issues simultaneously. Here, we improved Elastic net to accommodate time-course data analyses. Numerical experiments were conducted using two time-course microarray datasets derived from peripheral blood mononuclear cells collected from patients with MS. The proposed methods successfully identified genes showing a high predictive ability for INF-β treatment response. Bootstrap sampling resulted in an 81% and 78% accuracy for each dataset, which was significantly higher than the 71% and 73% accuracy obtained using conventional methods. Our methods selected genes showing consistent differentiation throughout all time-courses. These genes are expected to provide new predictive biomarkers that can influence INF-β treatment for MS patients.
Collapse
Affiliation(s)
- Arika Fukushima
- Doshisha University, Graduate School of Life and Medical Sciences, Kyoto, Japan
| | - Masahiro Sugimoto
- Research and Development Center for Minimally Invasive Therapies Health Promotion and Preemptive Medicine, Tokyo Medical University, Shinjuku, Tokyo, 160-8402, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, 997-0052, Japan.,University of Tsukuba, Research and Development Center for Precision Medicine, Tukuba, Ibaraki, 305-8550, Japan
| | - Satoru Hiwa
- Doshisha University, Graduate School of Life and Medical Sciences, Kyoto, Japan
| | - Tomoyuki Hiroyasu
- Doshisha University, Graduate School of Life and Medical Sciences, Kyoto, Japan.
| |
Collapse
|
8
|
Abramowicz K, Häger CK, Pini A, Schelin L, Sjöstedt de Luna S, Vantini S. Nonparametric inference for functional-on-scalar linear models applied to knee kinematic hop data after injury of the anterior cruciate ligament. Scand Stat Theory Appl 2018. [DOI: 10.1111/sjos.12333] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Konrad Abramowicz
- Department of Mathematics and Mathematical Statistics; Umeå University; Umeå Sweden
| | - Charlotte K. Häger
- Department of Community Medicine and Rehabilitation; Umeå University; Umeå Sweden
| | - Alessia Pini
- Department of Statistics, Umeå School of Business, Economics and Statistics; Umeå University; Umeå Sweden
- Department of Statistical Sciences; Università Cattolica del Sacro Cuore; Milan Italy
| | - Lina Schelin
- Department of Community Medicine and Rehabilitation; Umeå University; Umeå Sweden
- Department of Statistics, Umeå School of Business, Economics and Statistics; Umeå University; Umeå Sweden
| | | | - Simone Vantini
- MOX - Modelling and Scientific Computing Laboratory, Department of Mathematics; Politecnico di Milano; Milan Italy
| |
Collapse
|
9
|
Tsagris M, Lagani V, Tsamardinos I. Feature selection for high-dimensional temporal data. BMC Bioinformatics 2018; 19:17. [PMID: 29357817 PMCID: PMC5778658 DOI: 10.1186/s12859-018-2023-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 01/11/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work we extend established constrained-based, feature-selection methods to high-dimensional "omics" temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. RESULTS The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. CONCLUSIONS The use of this algorithm is suggested for variable selection with high-dimensional temporal data.
Collapse
Affiliation(s)
- Michail Tsagris
- Department of Computer Science, University of Crete, Voutes Campus, Heraklion, 70013 Greece
| | - Vincenzo Lagani
- Department of Computer Science, University of Crete, Voutes Campus, Heraklion, 70013 Greece
| | - Ioannis Tsamardinos
- Department of Computer Science, University of Crete, Voutes Campus, Heraklion, 70013 Greece
| |
Collapse
|
10
|
Santra T, Roche S, Conlon N, O’Donovan N, Crown J, O’Connor R, Kolch W. Identification of potential new treatment response markers and therapeutic targets using a Gaussian process-based method in lapatinib insensitive breast cancer models. PLoS One 2017; 12:e0177058. [PMID: 28481952 PMCID: PMC5421758 DOI: 10.1371/journal.pone.0177058] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 04/23/2017] [Indexed: 12/15/2022] Open
Abstract
Molecularly targeted therapeutics hold promise of revolutionizing treatments of advanced malignancies. However, a large number of patients do not respond to these treatments. Here, we take a systems biology approach to understand the molecular mechanisms that prevent breast cancer (BC) cells from responding to lapatinib, a dual kinase inhibitor that targets human epidermal growth factor receptor 2 (HER2) and epidermal growth factor receptor (EGFR). To this end, we analysed temporal gene expression profiles of four BC cell lines, two of which respond and the remaining two do not respond to lapatinib. For this analysis, we developed a Gaussian process based algorithm which can accurately find differentially expressed genes by analysing time course gene expression profiles at a fraction of the computational cost of other state-of-the-art algorithms. Our analysis identified 519 potential genes which are characteristic of lapatinib non-responsiveness in the tested cell lines. Data from the Genomics of Drug Sensitivity in Cancer (GDSC) database suggested that the basal expressions 120 of the above genes correlate with the response of BC cells to HER2 and/or EGFR targeted therapies. We selected 27 genes from the larger panel of 519 genes for experimental verification and 16 of these were successfully validated. Further bioinformatics analysis identified vitamin D receptor (VDR) as a potential target of interest for lapatinib non-responsive BC cells. Experimentally, calcitriol, a commonly used reagent for VDR targeted therapy, in combination with lapatinib additively inhibited proliferation in two HER2 positive cell lines, lapatinib insensitive MDA-MB-453 and lapatinib resistant HCC 1954-L cells.
Collapse
Affiliation(s)
- Tapesh Santra
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
- * E-mail:
| | - Sandra Roche
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - Neil Conlon
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - Norma O’Donovan
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - John Crown
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
- Department of Medical Oncology, St Vincent’s University Hospital, Dublin, Elm Park, Ireland
| | - Robert O’Connor
- National Institute for Cellular Biotechnology, Dublin City University, Dublin, Ireland
| | - Walter Kolch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
- Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin, Ireland
| |
Collapse
|