1
|
Xiong N, Sun Q. Identifying COVID-19 subtypes by single-sample gene set enrichemnt analysis and providing guidance for sensitive drug selection. J Med Virol 2024; 96:e29497. [PMID: 38436142 DOI: 10.1002/jmv.29497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/02/2024] [Accepted: 02/20/2024] [Indexed: 03/05/2024]
Abstract
This study aimed at using single-sample gene set enrichment analysis scores to cluster naso/pharyngeal swab specimen samples from coronavirus disease 2019 (COVID-19) patients into two clusters. One cluster with higher fractions of immune cells and more active inflammatory-related pathways was called the Immunity-High (Immunity-H) group, and the other one was called the Immunity-Low group. We explored impacts of the method on COVID-19 treatment. First, given that the Immunity-H group was mainly enriched in inflammatory-related pathways and had higher fractions of inflammatory cells, the Immunity-H group may obtain more curative effects from anti-inflammatory treatment. Second, we searched some hot genes from the PubMed platform that had been studied by researchers and found these genes upregulated in the Immunity-H group, so we speculated the Immunity-H group and Immunity-Low group may have different curative effects from drugs targeting these genes. Finally, we screened out hub genes for the Immunity-H group and predicted potential drugs for these hub genes by a public data set (http://dgidb.genome.wustl.edu). These hub genes are significantly upregulated in the Immunity-H group and neutrophils so that the Immunity-H group may obtain different treatment results from potential drugs compared with the Immunity-Low group. Therefore, the cluster method may provide help in drug development and administration for COVID-19 patients.
Collapse
Affiliation(s)
- Nan Xiong
- Institute of Medical Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Kunming, China
- Graduate School of Kunming Medical University, Kunming, China
| | - Qiangming Sun
- Institute of Medical Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Kunming, China
- Yunnan Key Laboratory of Vaccine Research & Development on Severe Infectious Diseases, Kunming, China
| |
Collapse
|
2
|
Wen XY, Wang RY, Yu B, Yang Y, Yang J, Zhang HC. Integrating single-cell and bulk RNA sequencing to predict prognosis and immunotherapy response in prostate cancer. Sci Rep 2023; 13:15597. [PMID: 37730847 PMCID: PMC10511553 DOI: 10.1038/s41598-023-42858-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 09/15/2023] [Indexed: 09/22/2023] Open
Abstract
Prostate cancer (PCa) stands as a prominent contributor to morbidity and mortality among males on a global scale. Cancer-associated fibroblasts (CAFs) are considered to be closely connected to tumour growth, invasion, and metastasis. We explored the role and characteristics of CAFs in PCa through bioinformatics analysis and built a CAFs-based risk model to predict prognostic treatment and treatment response in PCa patients. First, we downloaded the scRNA-seq data for PCa from the GEO. We extracted bulk RNA-seq data for PCa from the TCGA and GEO and adopted "ComBat" to remove batch effects. Then, we created a Seurat object for the scRNA-seq data using the package "Seurat" in R and identified CAF clusters based on the CAF-related genes (CAFRGs). Based on CAFRGs, a prognostic model was constructed by univariate Cox, LASSO, and multivariate Cox analyses. And the model was validated internally and externally by Kaplan-Meier analysis, respectively. We further performed GO and KEGG analyses of DEGs between risk groups. Besides, we investigated differences in somatic mutations between different risk groups. We explored differences in the immune microenvironment landscape and ICG expression levels in the different groups. Finally, we predicted the response to immunotherapy and the sensitivity of antitumour drugs between the different groups. We screened 4 CAF clusters and identified 463 CAFRGs in PCa scRNA-seq. We constructed a model containing 10 prognostic CAFRGs by univariate Cox, LASSO, and multivariate Cox analysis. Somatic mutation analysis revealed that TTN and TP53 were significantly more mutated in the high-risk group. Finally, we screened 31 chemotherapeutic drugs and targeted therapeutic drugs for PCa. In conclusion, we identified four clusters based on CAFs and constructed a new CAFs-based prognostic signature that could predict PCa patient prognosis and response to immunotherapy and might suggest meaningful clinical options for the treatment of PCa.
Collapse
Affiliation(s)
- Xiao Yan Wen
- Department of Urology, The Affilated Hospital and Clinical Medical College of Chengdu University, No.82, North Second Section of Second Ring Road, Chengdu, 610081, Sichuan, China
| | - Ru Yi Wang
- Department of Urology, The Affilated Hospital and Clinical Medical College of Chengdu University, No.82, North Second Section of Second Ring Road, Chengdu, 610081, Sichuan, China
| | - Bei Yu
- Department of Urology, The Affilated Hospital and Clinical Medical College of Chengdu University, No.82, North Second Section of Second Ring Road, Chengdu, 610081, Sichuan, China
| | - Yue Yang
- Department of Urology, The Affilated Hospital and Clinical Medical College of Chengdu University, No.82, North Second Section of Second Ring Road, Chengdu, 610081, Sichuan, China
| | - Jin Yang
- Department of Urology, The Affilated Hospital and Clinical Medical College of Chengdu University, No.82, North Second Section of Second Ring Road, Chengdu, 610081, Sichuan, China
| | - Han Chao Zhang
- Department of Urology, The Affilated Hospital and Clinical Medical College of Chengdu University, No.82, North Second Section of Second Ring Road, Chengdu, 610081, Sichuan, China.
- Medical College of Soochow University, Suzhou, 215000, Jiangsu, China.
| |
Collapse
|
3
|
van der Vaart AD, Wolstenholme JT, Smith ML, Harris GM, Lopez MF, Wolen AR, Becker HC, Williams RW, Miles MF. The allostatic impact of chronic ethanol on gene expression: A genetic analysis of chronic intermittent ethanol treatment in the BXD cohort. Alcohol 2017; 58:93-106. [PMID: 27838001 DOI: 10.1016/j.alcohol.2016.07.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 07/06/2016] [Accepted: 07/07/2016] [Indexed: 11/25/2022]
Abstract
The transition from acute to chronic ethanol exposure leads to lasting behavioral and physiological changes such as increased consumption, dependence, and withdrawal. Changes in brain gene expression are hypothesized to underlie these adaptive responses to ethanol. Previous studies on acute ethanol identified genetic variation in brain gene expression networks and behavioral responses to ethanol across the BXD panel of recombinant inbred mice. In this work, we have performed the first joint genetic and genomic analysis of transcriptome shifts in response to chronic intermittent ethanol (CIE) by vapor chamber exposure in a BXD cohort. CIE treatment is known to produce significant and sustained changes in ethanol consumption with repeated cycles of ethanol vapor. Using Affymetrix microarray analysis of prefrontal cortex (PFC) and nucleus accumbens (NAC) RNA, we compared CIE expression responses to those seen following acute ethanol treatment, and to voluntary ethanol consumption. Gene expression changes in PFC and NAC after CIE overlapped significantly across brain regions and with previously published expression following acute ethanol. Genes highly modulated by CIE were enriched for specific biological processes including synaptic transmission, neuron ensheathment, intracellular signaling, and neuronal projection development. Expression quantitative trait locus (eQTL) analyses identified genomic loci associated with ethanol-induced transcriptional changes with largely distinct loci identified between brain regions. Correlating CIE-regulated genes to ethanol consumption data identified specific genes highly associated with variation in the increase in drinking seen with repeated cycles of CIE. In particular, multiple myelin-related genes were identified. Furthermore, genetic variance in or near dynamin3 (Dnm3) on Chr1 at ∼164 Mb may have a major regulatory role in CIE-responsive gene expression. Dnm3 expression correlates significantly with ethanol consumption, is contained in a highly ranked functional group of CIE-regulated genes in the NAC, and has a cis-eQTL within a genomic region linked with multiple CIE-responsive genes.
Collapse
|
4
|
McDaneld TG, Kuehn LA, Thomas MG, Snelling WM, Smith TPL, Pollak EJ, Cole JB, Keele JW. Genomewide association study of reproductive efficiency in female cattle. J Anim Sci 2015; 92:1945-57. [PMID: 24782394 DOI: 10.2527/jas.2012-6807] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Reproductive efficiency is of economic importance in commercial beef cattle production, as failure to achieve pregnancy reduces the number of calves marketed per cow exposed. Identification of genetic markers with predictive merit for reproductive success would facilitate early selection of sires with daughters having improved reproductive rate without increasing generation intervals. To identify regions of the genome harboring variation affecting reproductive success, we applied a genomewide association study (GWAS) approach based on the >700,000 SNP marker assay, using a procedure based on genotyping multianimal pools of DNA to increase the number of animals that could be genotyped with available resources. Cows from several populations were classified according to reproductive efficiency, and DNA was pooled within population and phenotype prior to genotyping. Populations evaluated included a research population at the U.S. Meat Animal Research Center, 2 large commercial ranch populations, and a number of smaller populations (<100 head) across the United States. We detected 2 SNP with significant genomewide association (P ≤ 1.49 × 10(-7)), on BTA21 and BTA29, 3 SNP with suggestive associations (P ≤ 2.91 × 10(-6)) on BTA5, and 1 SNP with suggestive association each on BTA1 and BTA25. In addition to our novel findings, we confirmed previously published associations for SNP on BTA-X and all autosomes except 3 (BTA21, BTA22, and BTA28) encompassing substantial breed diversity including Bos indicus and Bos taurus breeds. The study identified regions of the genome associated with reproductive efficiency, which are being targeted for further analysis to develop robust marker systems, and demonstrated that DNA pooling can be used to substantially reduce the cost of GWAS in cattle.
Collapse
Affiliation(s)
- T G McDaneld
- USDA, ARS, U.S. Meat Animal Research Center, Clay Center, NE 68933
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
6
|
Abstract
Background Modern genetics has been transformed by high-throughput sequencing. New experimental designs in model organisms involve analyzing many individuals, pooled and sequenced in groups for increased efficiency. However, the uncertainty from pooling and the challenge of noisy sequencing data demand advanced computational methods. Results We present MULTIPOOL, a computational method for genetic mapping in model organism crosses that are analyzed by pooled genotyping. Unlike other methods for the analysis of pooled sequence data, we simultaneously consider information from all linked chromosomal markers when estimating the location of a causal variant. Our use of informative sequencing reads is formulated as a discrete dynamic Bayesian network, which we extend with a continuous approximation that allows for rapid inference without a dependence on the pool size. MULTIPOOL generalizes to include biological replicates and case-only or case-control designs for binary and quantitative traits. Conclusions Our increased information sharing and principled inclusion of relevant error sources improve resolution and accuracy when compared to existing methods, localizing associations to single genes in several cases. MULTIPOOL is freely available at http://cgs.csail.mit.edu/multipool/.
Collapse
Affiliation(s)
- Matthew D Edwards
- Computer Science and Artificial Intelligence Laboratory, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | |
Collapse
|
7
|
FHL2 interacts with CALM and is highly expressed in acute erythroid leukemia. Blood Cancer J 2011; 1:e42. [PMID: 22829078 PMCID: PMC3256755 DOI: 10.1038/bcj.2011.40] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 08/12/2011] [Indexed: 12/22/2022] Open
Abstract
The t(10;11)(p13;q14) translocation results in the fusion of the CALM (clathrin assembly lymphoid myeloid leukemia protein) and AF10 genes. This translocation is observed in acute myeloblastic leukemia (AML M6), acute lymphoblastic leukemia (ALL) and malignant lymphoma. Using a yeast two-hybrid screen, the four and a half LIM domain protein 2 (FHL2) was identified as a CALM interacting protein. Recently, high expression of FHL2 in breast, gastric, colon, lung as well as in prostate cancer was shown to be associated with an adverse prognosis. The interaction between CALM and FHL2 was confirmed by glutathione S-transferase-pulldown assay and co-immunoprecipitation experiments. The FHL2 interaction domain of CALM was mapped to amino acids 294–335 of CALM. The transcriptional activation capacity of FHL2 was reduced by CALM, but not by CALM/AF10, which suggests that regulation of FHL2 by CALM might be disturbed in CALM/AF10-positive leukemia. Extremely high expression of FHL2 was seen in acute erythroid leukemia (AML M6). FHL2 was also highly expressed in chronic myeloid leukemia and in AML with complex aberrant karyotype. These results suggest that FHL2 may play an important role in leukemogenesis, especially in the case of AML M6.
Collapse
|
8
|
Gasbarra D, Kulathinal S, Pirinen M, Sillanpää MJ. Estimating haplotype frequencies by combining data from large DNA pools with database information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:36-44. [PMID: 21071795 DOI: 10.1109/tcbb.2009.71] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.
Collapse
Affiliation(s)
- Dario Gasbarra
- Department of Mathematics and Statistics, University of Helsinki, FIN 00014 Helsinki, Finland.
| | | | | | | |
Collapse
|
9
|
Thomas DC, Casey G, Conti DV, Haile RW, Lewinger JP, Stram DO. Methodological Issues in Multistage Genome-wide Association Studies. Stat Sci 2009; 24:414-429. [PMID: 20607129 DOI: 10.1214/09-sts288] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of "promising" SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a "replication" panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent "exact replication" study is needed in a similar population of the same promising SNPs using similar methods. This can then be followed by (1) "generalizability" studies to assess the full scope of replicated associations across different races, different endpoints, different interactions, etc.; (2) fine-mapping or re-sequencing to try to identify the causal variant; and (3) experimental studies of the biological function of these genes. Multistage sampling designs may be more useful at this stage, say for selecting subsets of subjects for deep re-sequencing of regions identified in the GWAS.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California
| | | | | | | | | | | |
Collapse
|
10
|
Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm. Genet Res (Camb) 2009; 90:509-24. [PMID: 19123969 DOI: 10.1017/s0016672308009877] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates are also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.
Collapse
|
11
|
Homer N, Tembe WD, Szelinger S, Redman M, Stephan DA, Pearson JV, Nelson SF, Craig D. Multimarker analysis and imputation of multiple platform pooling-based genome-wide association studies. Bioinformatics 2008; 24:1896-902. [PMID: 18617537 PMCID: PMC2732219 DOI: 10.1093/bioinformatics/btn333] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2008] [Revised: 06/26/2008] [Accepted: 06/27/2008] [Indexed: 12/26/2022] Open
Abstract
For many genome-wide association (GWA) studies individually genotyping one million or more SNPs provides a marginal increase in coverage at a substantial cost. Much of the information gained is redundant due to the correlation structure inherent in the human genome. Pooling-based GWA studies could benefit significantly by utilizing this redundancy to reduce noise, improve the accuracy of the observations and increase genomic coverage. We introduce a measure of correlation between individual genotyping and pooling, under the same framework that r(2) provides a measure of linkage disequilibrium (LD) between pairs of SNPs. We then report a new non-haplotype multimarker multi-loci method that leverages the correlation structure between SNPs in the human genome to increase the efficacy of pooling-based GWA studies. We first give a theoretical framework and derivation of our multimarker method. Next, we evaluate simulations using this multimarker approach in comparison to single marker analysis. Finally, we experimentally evaluate our method using different pools of HapMap individuals on the Illumina 450S Duo, Illumina 550K and Affymetrix 5.0 platforms for a combined total of 1 333 631 SNPs. Our results show that use of multimarker analysis reduces noise specific to pooling-based studies, allows for efficient integration of multiple microarray platforms and provides more accurate measures of significance than single marker analysis. Additionally, this approach can be extended to allow for imputing the association significance for SNPs not directly observed using neighboring SNPs in LD. This multimarker method can now be used to cost-effectively complete pooling-based GWA studies with multiple platforms across over one million SNPs and to impute neighboring SNPs weighted for the loss of information due to pooling.
Collapse
Affiliation(s)
- Nils Homer
- Translational Genomics Research Institute (TGen), Phoenix, AZ 85004, USA
| | | | | | | | | | | | | | | |
Collapse
|