1
|
Han M, Cui G, Zhao Y, Zuo X, Wang X, Zhang X, Mi N, Jin J, Xiao C, Wang J, Wu W, Li Y, Li J. Evaluation of drug-drug interaction between Suraxavir Marboxil (GP681) and itraconazole, and assessment of the impact of gene polymorphism. Front Pharmacol 2025; 15:1505557. [PMID: 40291342 PMCID: PMC12022903 DOI: 10.3389/fphar.2024.1505557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 12/19/2024] [Indexed: 04/30/2025] Open
Abstract
Introduction Suraxavir Marboxil (GP681) is a prodrug metabolized to GP1707D07, which inhibits influenza viral replication by targeting cap-dependent endonuclease through a single oral dose. This study assesses the in vivo drug-drug interaction (DDI) potential between GP681 (including its major metabolite GP1707D07, a substrate of CYP3A4) and itraconazole in healthy Chinese subjects, along with the safety profiles during co-administration. Additionally, it evaluates the impact of CYP1A2, CYP2C19, and CYP3A4 gene polymorphisms on GP1707D07 metabolism. Methods The study enrolled twelve healthy adult subjects to receive the treatments consisting of GP681 monotherapy and GP681-itraconazole co-administration in a fixed-sequence. Single nucleotide polymorphisms (SNPs) in CYP gene loci were also analyzed. Results Co-administration of itraconazole increased the GP1707D07 AUC0- ∞ by about 2.5 folds and Cmax by about 1.4 folds compared with GP681 administered alone. Differences in system exposure were more pronounced during the terminal elimination phase than the early stage of GP1707D07 metabolism. No significant increase in adverse events was observed during co-administration. Using random forest algorithm, we estimated effects of cytochrome P450 enzymes followed the order of CYP 3A4 > CYP 1A2 > CYP 2C19. We also hypothesized CYP 3A4 rs4646437 A>G, CYP 3A4 rs2246709 G>A, and CYP 2C19 rs12768009 A>G to be mutations that enhanced enzyme activity, while CYP1A2 rs762551 C>A weakened it. Discussion The pharmacokinetic changes of GP1707D07 during itraconazole co-administration are insufficient to warrant clinical action. Random forest algorithm enhances the understanding of pharmacogenetic variants involved in GP1707D07 metabolism and may serve as a potent tool for assessing gene polymorphism data in small clinical samples. Clinical Trial Registration clinicaltrials.gov, identifier NCT05789342.
Collapse
Affiliation(s)
- Mai Han
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Gang Cui
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Yan Zhao
- Qingfeng Pharmaceutical Group Co., Ltd., Ganzhou, Jiangxi, China
| | - Xianbo Zuo
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Xiaoxue Wang
- Department of Pharmacy, State Key Laboratory of Respiratory Health and Multimorbidity, China-Japan Friendship Hospital, Beijing, China
| | - Xin Zhang
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Na Mi
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Jiangli Jin
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Chunyan Xiao
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Jing Wang
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Wei Wu
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| | - Yajuan Li
- Qingfeng Pharmaceutical Group Co., Ltd., Ganzhou, Jiangxi, China
| | - Jintong Li
- Drug Clinical Trial Research Center, China-Japan Friendship Hospital, Beijing, China
| |
Collapse
|
2
|
Nourisa J, Passemiers A, Shakeri F, Omidi M, Helmholz H, Raimondi D, Moreau Y, Tomforde S, Schlüter H, Luthringer-Feyerabend B, Cyron CJ, Aydin RC, Willumeit-Römer R, Zeller-Plumhoff B. Gene regulatory network analysis identifies MYL1, MDH2, GLS, and TRIM28 as the principal proteins in the response of mesenchymal stem cells to Mg 2+ ions. Comput Struct Biotechnol J 2024; 23:1773-1785. [PMID: 38689715 PMCID: PMC11058716 DOI: 10.1016/j.csbj.2024.04.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
Magnesium (Mg)-based implants have emerged as a promising alternative for orthopedic applications, owing to their bioactive properties and biodegradability. As the implants degrade, Mg2+ ions are released, influencing all surrounding cell types, especially mesenchymal stem cells (MSCs). MSCs are vital for bone tissue regeneration, therefore, it is essential to understand their molecular response to Mg2+ ions in order to maximize the potential of Mg-based biomaterials. In this study, we conducted a gene regulatory network (GRN) analysis to examine the molecular responses of MSCs to Mg2+ ions. We used time-series proteomics data collected at 11 time points across a 21-day period for the GRN construction. We studied the impact of Mg2+ ions on the resulting networks and identified the key proteins and protein interactions affected by the application of Mg2+ ions. Our analysis highlights MYL1, MDH2, GLS, and TRIM28 as the primary targets of Mg2+ ions in the response of MSCs during 1-21 days phase. Our results also identify MDH2-MYL1, MDH2-RPS26, TRIM28-AK1, TRIM28-SOD2, and GLS-AK1 as the critical protein relationships affected by Mg2+ ions. By offering a comprehensive understanding of the regulatory role of Mg2+ ions on MSCs, our study contributes valuable insights into the molecular response of MSCs to Mg-based materials, thereby facilitating the development of innovative therapeutic strategies for orthopedic applications.
Collapse
Affiliation(s)
- Jalil Nourisa
- Institute of Material Systems Modeling, Helmholtz Zentrum Hereon, Geesthacht, Germany
| | | | - Farhad Shakeri
- Institute of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Bonn, Germany
| | - Maryam Omidi
- Institute of Clinical Chemistry/Central Laboratories, University Medical Center Hamburg, Hamburg, Germany
| | - Heike Helmholz
- Institute of Metallic Biomaterials, Helmholtz Zentrum Hereon, Geesthacht, Germany
| | | | | | - Sven Tomforde
- Department of Computer Science, Intelligent Systems, University of Kiel, Kiel, Germany
| | - Hartmuth Schlüter
- Institute of Clinical Chemistry and Laboratory Medicine Diagnostic Center, University of Hamburg, Hamburg, Germany
| | | | - Christian J. Cyron
- Institute of Material Systems Modeling, Helmholtz Zentrum Hereon, Geesthacht, Germany
- Institute for Continuum and Material Mechanics, Hamburg University of Technology, Hamburg, Germany
| | - Roland C. Aydin
- Institute of Material Systems Modeling, Helmholtz Zentrum Hereon, Geesthacht, Germany
- Institute for Continuum and Material Mechanics, Hamburg University of Technology, Hamburg, Germany
| | | | | |
Collapse
|
3
|
Hahn L, Kurtz C, de Paula BV, Feltrim AL, Higashikawa FS, Moreira C, Rozane DE, Brunetto G, Parent LÉ. Feature-specific nutrient management of onion (Allium cepa) using machine learning and compositional methods. Sci Rep 2024; 14:6034. [PMID: 38472199 DOI: 10.1038/s41598-024-55647-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
While onion cultivars, irrigation and soil and crop management have been given much attention in Brazil to boost onion yields, nutrient management at field scale is still challenging due to large dosage uncertainty. Our objective was to develop an accurate feature-based fertilization model for onion crops. We assembled climatic, edaphic, and managerial features as well as tissue tests into a database of 1182 observations from multi-environment fertilizer trials conducted during 13 years in southern Brazil. The complexity of onion cropping systems was captured by machine learning (ML) methods. The RReliefF ranking algorithm showed that the split-N dosage and soil tests for micronutrients and S were the most relevant features to predict bulb yield. The decision-tree random forest and extreme gradient boosting models were accurate to predict bulb yield from the relevant predictors (R2 > 90%). As shown by the gain ratio, foliar nutrient standards for nutritionally balanced and high-yielding specimens producing > 50 Mg bulb ha-1 set apart by the ML classification models differed among cultivars. Cultivar × environment interactions support documenting local nutrient diagnosis. The split-N dosage was the most relevant controllable feature to run future universality tests set to assess models' ability to generalize to growers' fields.
Collapse
Affiliation(s)
- Leandro Hahn
- Caçador Experimental Station, Research and Rural Extension of Santa Catarina (Epagri), Epagri, Abílio Franco Street, 1500, Caçador, Santa Catarina, 89501-032, Brazil
| | - Claudinei Kurtz
- Ituporanga Experimental Station, Research and Rural Extension of Santa Catarina (Epagri), Epagri, Lageado Águas Negras General Road, Ituporanga, Santa Catarina, 88400-000, Brazil
| | - Betania Vahl de Paula
- Department of Soil, Federal University of Santa Maria, Ave. Roraima, 1000, Building 42, Santa Maria, RS, 97105-900, Brazil.
| | - Anderson Luiz Feltrim
- Caçador Experimental Station, Research and Rural Extension of Santa Catarina (Epagri), Epagri, Abílio Franco Street, 1500, Caçador, Santa Catarina, 89501-032, Brazil
| | - Fábio Satoshi Higashikawa
- Ituporanga Experimental Station, Research and Rural Extension of Santa Catarina (Epagri), Epagri, Lageado Águas Negras General Road, Ituporanga, Santa Catarina, 88400-000, Brazil
| | - Camila Moreira
- University Alto Vale do Rio do Peixe, Uniarp, Victor Baptista Adami Street, 800, Caçador, Santa Catarina, 89500-000, Brazil
| | - Danilo Eduardo Rozane
- State University Paulista "Julio Mesquita Filho", Campus Registro. Registro, Av. Nelson Brihi Badur, 430, São Paulo, 11900-000, Brazil
| | - Gustavo Brunetto
- Department of Soil, Federal University of Santa Maria, Ave. Roraima, 1000, Building 42, Santa Maria, RS, 97105-900, Brazil
| | - Léon-Étienne Parent
- Department of Soil, Federal University of Santa Maria, Ave. Roraima, 1000, Building 42, Santa Maria, RS, 97105-900, Brazil
- Department of Soils and Agrifood Engineering, Laval University, Quebec, QC, G1V 0A6, Canada
| |
Collapse
|
4
|
Cao X, Zhang L, Islam MK, Zhao M, He C, Zhang K, Liu S, Sha Q, Wei H. TGPred: efficient methods for predicting target genes of a transcription factor by integrating statistics, machine learning and optimization. NAR Genom Bioinform 2023; 5:lqad083. [PMID: 37711605 PMCID: PMC10498345 DOI: 10.1093/nargab/lqad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 05/30/2023] [Accepted: 08/30/2023] [Indexed: 09/16/2023] Open
Abstract
Four statistical selection methods for inferring transcription factor (TF)-target gene (TG) pairs were developed by coupling mean squared error (MSE) or Huber loss function, with elastic net (ENET) or least absolute shrinkage and selection operator (Lasso) penalty. Two methods were also developed for inferring pathway gene regulatory networks (GRNs) by combining Huber or MSE loss function with a network (Net)-based penalty. To solve these regressions, we ameliorated an accelerated proximal gradient descent (APGD) algorithm to optimize parameter selection processes, resulting in an equally effective but much faster algorithm than the commonly used convex optimization solver. The synthetic data generated in a general setting was used to test four TF-TG identification methods, ENET-based methods performed better than Lasso-based methods. Synthetic data generated from two network settings was used to test Huber-Net and MSE-Net, which outperformed all other methods. The TF-TG identification methods were also tested with SND1 and gl3 overexpression transcriptomic data, Huber-ENET and MSE-ENET outperformed all other methods when genome-wide predictions were performed. The TF-TG identification methods fill the gap of lacking a method for genome-wide TG prediction of a TF, and potential for validating ChIP/DAP-seq results, while the two Net-based methods are instrumental for predicting pathway GRNs.
Collapse
Affiliation(s)
- Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Ling Zhang
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Md Khairul Islam
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| | - Mingxia Zhao
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Cheng He
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
| | - Hairong Wei
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA
- Computational Science and Engineering Program, Michigan Technological University, Houghton, MI 49931, USA
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931, USA
| |
Collapse
|
5
|
Pomiès L, Brouard C, Duruflé H, Maigné É, Carré C, Gody L, Trösser F, Katsirelos G, Mangin B, Langlade NB, de Givry S. Gene regulatory network inference methodology for genomic and transcriptomic data acquired in genetically related heterozygote individuals. Bioinformatics 2022; 38:4127-4134. [PMID: 35792837 DOI: 10.1093/bioinformatics/btac445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 06/17/2022] [Accepted: 07/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Inferring gene regulatory networks in non-independent genetically related panels is a methodological challenge. This hampers evolutionary and biological studies using heterozygote individuals such as in wild sunflower populations or cultivated hybrids. RESULTS First, we simulated 100 datasets of gene expressions and polymorphisms, displaying the same gene expression distributions, heterozygosities and heritabilities as in our dataset including 173 genes and 353 genotypes measured in sunflower hybrids. Secondly, we performed a meta-analysis based on six inference methods [least absolute shrinkage and selection operator (Lasso), Random Forests, Bayesian Networks, Markov Random Fields, Ordinary Least Square and fast inference of networks from directed regulation (Findr)] and selected the minimal density networks for better accuracy with 64 edges connecting 79 genes and 0.35 area under precision and recall (AUPR) score on average. We identified that triangles and mutual edges are prone to errors in the inferred networks. Applied on classical datasets without heterozygotes, our strategy produced a 0.65 AUPR score for one dataset of the DREAM5 Systems Genetics Challenge. Finally, we applied our method to an experimental dataset from sunflower hybrids. We successfully inferred a network composed of 105 genes connected by 106 putative regulations with a major connected component. AVAILABILITY AND IMPLEMENTATION Our inference methodology dedicated to genomic and transcriptomic data is available at https://forgemia.inra.fr/sunrise/inference_methods. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lise Pomiès
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Céline Brouard
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Harold Duruflé
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Élise Maigné
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Clément Carré
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - Louise Gody
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Fulya Trösser
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| | - George Katsirelos
- MIA-Paris, AgroParisTech, Université Paris-Saclay, INRAE, Paris 75231, France
| | - Brigitte Mangin
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Nicolas B Langlade
- LIPME, Université de Toulouse, INRAE, CNRS, Castanet-Tolosan 31326, France
| | - Simon de Givry
- MIAT, Université Fédérale de Toulouse, INRAE, Castanet-Tolosan 31326, France
| |
Collapse
|
6
|
Yan J, Wang X. Unsupervised and semi-supervised learning: the next frontier in machine learning for plant systems biology. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1527-1538. [PMID: 35821601 DOI: 10.1111/tpj.15905] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Advances in high-throughput omics technologies are leading plant biology research into the era of big data. Machine learning (ML) performs an important role in plant systems biology because of its excellent performance and wide application in the analysis of big data. However, to achieve ideal performance, supervised ML algorithms require large numbers of labeled samples as training data. In some cases, it is impossible or prohibitively expensive to obtain enough labeled training data; here, the paradigms of unsupervised learning (UL) and semi-supervised learning (SSL) play an indispensable role. In this review, we first introduce the basic concepts of ML techniques, as well as some representative UL and SSL algorithms, including clustering, dimensionality reduction, self-supervised learning (self-SL), positive-unlabeled (PU) learning and transfer learning. We then review recent advances and applications of UL and SSL paradigms in both plant systems biology and plant phenotyping research. Finally, we discuss the limitations and highlight the significance and challenges of UL and SSL strategies in plant systems biology.
Collapse
Affiliation(s)
- Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| |
Collapse
|
7
|
Forecasting Water Temperature in Cascade Reservoir Operation-Influenced River with Machine Learning Models. WATER 2022. [DOI: 10.3390/w14142146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Water temperature (WT) is a critical control for various physical and biochemical processes in riverine systems. Although the prediction of river water temperature has been the subject of extensive research, very few studies have examined the relative importance of elements affecting WT and how to accurately estimate WT under the effects of cascaded dams. In this study, a series of potential influencing variables, such as air temperature, dew temperature, river discharge, day of year, wind speed and precipitation, were used to forecast daily river water temperature downstream of cascaded dams. First, the permutation importance of the influencing variables was ranked in six different machine learning models, including decision tree (DT), random forest (RF), gradient boosting (GB), adaptive boosting (AB), support vector regression (SVR) and multilayer perceptron neural network (MLPNN) models. The results showed that day of year (DOY) plays the most important role in each model for the prediction of WT, followed by flow and temperature, which are two commonly important factors in unregulated rivers. Then, combinations of the three most important inputs were used to develop the most parsimonious model based on the six machine learning models, where their performance was compared according to statistical metrics. The results demonstrated that GB3 and RF3 gave the most accurate forecasts for the training dataset and the test dataset, respectively. Overall, the results showed that the machine learning model could be effectively applied to predict river water temperature under the regulation of cascaded dams.
Collapse
|
8
|
Beier S, Stiegler M, Hitzenhammer E, Monika S. Screening for genes involved in cellulase regulation by expression under the control of a novel constitutive promoter in Trichoderma reesei. CURRENT RESEARCH IN BIOTECHNOLOGY 2022. [DOI: 10.1016/j.crbiot.2022.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
|
9
|
Alvarez JM, Brooks MD, Swift J, Coruzzi GM. Time-Based Systems Biology Approaches to Capture and Model Dynamic Gene Regulatory Networks. ANNUAL REVIEW OF PLANT BIOLOGY 2021; 72:105-131. [PMID: 33667112 PMCID: PMC9312366 DOI: 10.1146/annurev-arplant-081320-090914] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
All aspects of transcription and its regulation involve dynamic events. However, capturing these dynamic events in gene regulatory networks (GRNs) offers both a promise and a challenge. The promise is that capturing and modeling the dynamic changes in GRNs will allow us to understand how organisms adapt to a changing environment. The ability to mount a rapid transcriptional response to environmental changes is especially important in nonmotile organisms such as plants. The challenge is to capture these dynamic, genome-wide events and model them in GRNs. In this review, we cover recent progress in capturing dynamic interactions of transcription factors with their targets-at both the local and genome-wide levels-and how they are used to learn how GRNs operate as a function of time. We also discuss recent advances that employ time-based machine learning approaches to forecast gene expression at future time points, a key goal of systems biology.
Collapse
Affiliation(s)
- Jose M Alvarez
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- ANID-Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Matthew D Brooks
- Global Change and Photosynthesis Research Unit, US Department of Agriculture Agricultural Research Service, Urbana, Illinois 61801, USA
| | - Joseph Swift
- Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA;
| |
Collapse
|
10
|
Deng W, Zhang K, He C, Liu S, Wei H. HB-PLS: A statistical method for identifying biological process or pathway regulators by integrating Huber loss and Berhu penalty with partial least squares regression. FORESTRY RESEARCH 2021; 1:6. [PMID: 39524509 PMCID: PMC11524267 DOI: 10.48130/fr-2021-0006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 03/11/2021] [Indexed: 11/16/2024]
Abstract
Gene expression data features high dimensionality, multicollinearity, and non-Gaussian distribution noise, posing hurdles for identification of true regulatory genes controlling a biological process or pathway. In this study, we integrated the Huber loss function and the Berhu penalty (HB) into partial least squares (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data, and developed a new method called HB-PLS regression to model the relationships between regulatory genes and pathway genes. To solve the Huber-Berhu optimization problem, an accelerated proximal gradient descent algorithm with at least 10 times faster than the general convex optimization solver (CVX), was developed. Application of HB-PLS to recognize pathway regulators of lignin biosynthesis and photosynthesis in Arabidopsis thaliana led to the identification of many known positive pathway regulators that had previously been experimentally validated. As compared to sparse partial least squares (SPLS) regression, an efficient method for variable selection and dimension reduction in handling multicollinearity, HB-PLS has higher efficacy in identifying more positive known regulators, a much higher but slightly less sensitivity/(1-specificity) in ranking the true positive known regulators to the top of the output regulatory gene lists for the two aforementioned pathways. In addition, each method could identify some unique regulators that cannot be identified by the other methods. Our results showed that the overall performance of HB-PLS slightly exceeds that of SPLS but both methods are instrumental for identifying real pathway regulators from high-throughput gene expression data, suggesting that integration of statistics, machine leaning and convex optimization can result in a method with high efficacy and is worth further exploration.
Collapse
Affiliation(s)
- Wenping Deng
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan 49931, United States of America
| | - Kui Zhang
- Department of Mathematical Science, Michigan Technological University, Houghton, Michigan 49931, United States of America
| | - Cheng He
- Department of Plant Pathology, Kansas State University, Manhattan, Kansas 66506, United States of America
| | - Sanzhen Liu
- Department of Plant Pathology, Kansas State University, Manhattan, Kansas 66506, United States of America
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan 49931, United States of America
| |
Collapse
|
11
|
Morris JS, Luthra R, Liu Y, Duose DY, Lee W, Reddy NG, Windham J, Chen H, Tong Z, Zhang B, Wei W, Ganiraju M, Broom BM, Alvarez HA, Mejia A, Veeranki O, Routbort MJ, Morris VK, Overman MJ, Menter D, Katkhuda R, Wistuba II, Davis JS, Kopetz S, Maru DM. Development and Validation of a Gene Signature Classifier for Consensus Molecular Subtyping of Colorectal Carcinoma in a CLIA-Certified Setting. Clin Cancer Res 2020; 27:120-130. [PMID: 33109741 DOI: 10.1158/1078-0432.ccr-20-2403] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 09/28/2020] [Accepted: 10/23/2020] [Indexed: 11/16/2022]
Abstract
PURPOSE Consensus molecular subtyping (CMS) of colorectal cancer has potential to reshape the colorectal cancer landscape. We developed and validated an assay that is applicable on formalin-fixed, paraffin-embedded (FFPE) samples of colorectal cancer and implemented the assay in a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory. EXPERIMENTAL DESIGN We performed an in silico experiment to build an optimal CMS classifier using a training set of 1,329 samples from 12 studies and validation set of 1,329 samples from 14 studies. We constructed an assay on the basis of NanoString CodeSets for the top 472 genes, and performed analyses on paired flash-frozen (FF)/FFPE samples from 175 colorectal cancers to adapt the classifier to FFPE samples using a subset of genes found to be concordant between FF and FFPE, tested the classifier's reproducibility and repeatability, and validated in a CLIA-certified laboratory. We assessed prognostic significance of CMS in 345 patients pooled across three clinical trials. RESULTS The best classifier was weighted support vector machine with high accuracy across platforms and gene lists (>0.95), and the 472-gene model outperforming existing classifiers. We constructed subsets of 99 and 200 genes with high FF/FFPE concordance, and adapted FFPE-based classifier that had strong classification accuracy (>80%) relative to "gold standard" CMS. The classifier was reproducible to sample type and RNA quality, and demonstrated poor prognosis for CMS1-3 and good prognosis for CMS2 in metastatic colorectal cancer (P < 0.001). CONCLUSIONS We developed and validated a colorectal cancer CMS assay that is ready for use in clinical trials, to assess prognosis in standard-of-care settings and explore as predictor of therapy response.
Collapse
Affiliation(s)
- Jeffrey S Morris
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
| | - Rajyalakshmi Luthra
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Yusha Liu
- Department of Biostatistics, University of Chicago School of Medicine, Chicago, Illinois
| | - Dzifa Y Duose
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Wonyul Lee
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Neelima G Reddy
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | | | - Huiqin Chen
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Zhimin Tong
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Baili Zhang
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Wei Wei
- Cleveland Clinic Foundation, Cleveland, Ohio
| | - Manyam Ganiraju
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Bradley M Broom
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Hector A Alvarez
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Alicia Mejia
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Omkara Veeranki
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Mark J Routbort
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Van K Morris
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Michael J Overman
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - David Menter
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Riham Katkhuda
- Department of Pathology, University of Chicago Medical Center, Chicago, Illinois
| | - Ignacio I Wistuba
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Jennifer S Davis
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Scott Kopetz
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Dipen M Maru
- Division of Pathology and Laboratory Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas.
| |
Collapse
|
12
|
Mass Spectrometry Imaging for Reliable and Fast Classification of Non-Small Cell Lung Cancer Subtypes. Cancers (Basel) 2020; 12:cancers12092704. [PMID: 32967325 PMCID: PMC7564257 DOI: 10.3390/cancers12092704] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Revised: 08/25/2020] [Accepted: 09/16/2020] [Indexed: 02/07/2023] Open
Abstract
Simple Summary Diagnostic subtyping of non-small cell lung cancer is paramount for therapy stratification. Our study shows that the subtyping into pulmonary adenocarcinoma and pulmonary squamous cell carcinoma by mass spectrometry imaging is rapid and accurate using limited tissue material. Abstract Subtyping of non-small cell lung cancer (NSCLC) is paramount for therapy stratification. In this study, we analyzed the largest NSCLC cohort by mass spectrometry imaging (MSI) to date. We sought to test different classification algorithms and to validate results obtained in smaller patient cohorts. Tissue microarrays (TMAs) from including adenocarcinoma (ADC, n = 499) and squamous cell carcinoma (SqCC, n = 440), were analyzed. Linear discriminant analysis, support vector machine, and random forest (RF) were applied using samples randomly assigned for training (66%) and validation (33%). The m/z species most relevant for the classification were identified by on-tissue tandem mass spectrometry and validated by immunohistochemistry (IHC). Measurements from multiple TMAs were comparable using standardized protocols. RF yielded the best classification results. The classification accuracy decreased after including less than six of the most relevant m/z species. The sensitivity and specificity of MSI in the validation cohort were 92.9% and 89.3%, comparable to IHC. The most important protein for the discrimination of both tumors was cytokeratin 5. We investigated the largest NSCLC cohort by MSI to date and found that the classification of NSCLC into ADC and SqCC is possible with high accuracy using a limited set of m/z species.
Collapse
|