1
|
A temporal gradient of cytonuclear coordination of chaperonins and chaperones during RuBisCo biogenesis in allopolyploid plants. Proc Natl Acad Sci U S A 2022; 119:e2200106119. [PMID: 35969751 PMCID: PMC9407610 DOI: 10.1073/pnas.2200106119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCo), consisting of subunits encoded by nuclear and cytoplasmic genes, is a model for cytonuclear evolution in plant allopolyploids. To date, coordinated cytonuclear evolutionary responses of auxiliary cofactors involved in RuBisCo biogenesis remain unexplored. This study characterized and compared genomic and transcriptional cytonuclear coevolutionary responses of chaperonin/chaperones in RuBisCo folding and assembly processes across different allopolyploids. We discovered significant cytonuclear evolutionary responses in folding cofactors, with diminishing or attenuated responses later during assembly. Our results have general significance for understanding the unrecognized cytonuclear evolution of chaperonin/chaperone genes, structural and functional features of intermediate complexes, and the functioning stage of the Raf2 cofactor. Generally, the results reveal a hitherto unexplored dimension of allopolyploidy in plants. Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCo) has long been studied from many perspectives. As a multisubunit (large subunits [LSUs] and small subunits[SSUs]) protein encoded by genes residing in the chloroplast (rbcL) and nuclear (rbcS) genomes, RuBisCo also is a model for cytonuclear coevolution following allopolyploid speciation in plants. Here, we studied the genomic and transcriptional cytonuclear coordination of auxiliary chaperonin and chaperones that facilitate RuBisCo biogenesis across multiple natural and artificially synthesized plant allopolyploids. We found similar genomic and transcriptional cytonuclear responses, including respective paternal-to-maternal conversions and maternal homeologous biased expression, in chaperonin/chaperon-assisted folding and assembly of RuBisCo in different allopolyploids. One observation is about the temporally attenuated genomic and transcriptional cytonuclear evolutionary responses during early folding and later assembly process of RuBisCo biogenesis, which were established by long-term evolution and immediate onset of allopolyploidy, respectively. Our study not only points to the potential widespread and hitherto unrecognized features of cytonuclear evolution but also bears implications for the structural interaction interface between LSU and Cpn60 chaperonin and the functioning stage of the Raf2 chaperone.
Collapse
|
2
|
Luzuriaga-Neira A, Subramanian K, Alvarez-Ponce D. Functional compensation of mouse duplicates by their paralogs expressed in the same tissues. Genome Biol Evol 2022; 14:evac126. [PMID: 35945673 PMCID: PMC9387915 DOI: 10.1093/gbe/evac126] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 07/30/2022] [Indexed: 11/14/2022] Open
Abstract
Analyses in a number of organisms have shown that duplicated genes are less likely to be essential than singletons. This implies that genes can often compensate for the loss of their paralogs. However, it is unclear why the loss of some duplicates can be compensated by their paralogs, whereas the loss of other duplicates cannot. Surprisingly, initial analyses in mice did not detect differences in the essentiality of duplicates and singletons. Only subsequent analyses, using larger gene knockout datasets and controlling for a number of confounding factors, did detect significant differences. Previous studies have not taken into account the tissues in which duplicates are expressed. We hypothesized that in complex organisms, in order for a gene's loss to be compensated by one or more of its paralogs, such paralogs need to be expressed in at least the same set of tissues as the lost gene. To test our hypothesis, we classified mouse duplicates into two categories based on the expression patterns of their paralogs: "compensable duplicates" (those with paralogs expressed in all the tissues in which the gene is expressed) and "non-compensable duplicates" (those whose paralogs are not expressed in all the tissues where the gene is expressed). In agreement with our hypothesis, the essentiality of non-compensable duplicates is similar to that of singletons, whereas compensable duplicates exhibit a substantially lower essentiality. Our results imply that duplicates can often compensate for the loss of their paralogs, but only if they are expressed in the same tissues. Indeed, the compensation ability is more dependent on expression patterns than on protein sequence similarity. The existence of these two kinds of duplicates with different essentialities, which has been overlooked by prior studies, may have hindered the detection of differences between singletons and duplicates.
Collapse
|
3
|
Dressler L, Bortolomeazzi M, Keddar MR, Misetic H, Sartini G, Acha-Sagredo A, Montorsi L, Wijewardhane N, Repana D, Nulsen J, Goldman J, Pollitt M, Davis P, Strange A, Ambrose K, Ciccarelli FD. Comparative assessment of genes driving cancer and somatic evolution in non-cancer tissues: an update of the Network of Cancer Genes (NCG) resource. Genome Biol 2022; 23:35. [PMID: 35078504 PMCID: PMC8790917 DOI: 10.1186/s13059-022-02607-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 01/10/2022] [Indexed: 12/30/2022] Open
Abstract
Background Genetic alterations of somatic cells can drive non-malignant clone formation and promote cancer initiation. However, the link between these processes remains unclear and hampers our understanding of tissue homeostasis and cancer development. Results Here, we collect a literature-based repertoire of 3355 well-known or predicted drivers of cancer and non-cancer somatic evolution in 122 cancer types and 12 non-cancer tissues. Mapping the alterations of these genes in 7953 pan-cancer samples reveals that, despite the large size, the known compendium of drivers is still incomplete and biased towards frequently occurring coding mutations. High overlap exists between drivers of cancer and non-cancer somatic evolution, although significant differences emerge in their recurrence. We confirm and expand the unique properties of drivers and identify a core of evolutionarily conserved and essential genes whose germline variation is strongly counter-selected. Somatic alteration in even one of these genes is sufficient to drive clonal expansion but not malignant transformation. Conclusions Our study offers a comprehensive overview of our current understanding of the genetic events initiating clone expansion and cancer revealing significant gaps and biases that still need to be addressed. The compendium of cancer and non-cancer somatic drivers, their literature support, and properties are accessible in the Network of Cancer Genes and Healthy Drivers resource at http://www.network-cancer-genes.org/. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02607-z.
Collapse
|
4
|
Three topological features of regulatory networks control life-essential and specialized subsystems. Sci Rep 2021; 11:24209. [PMID: 34930908 PMCID: PMC8688434 DOI: 10.1038/s41598-021-03625-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 12/07/2021] [Indexed: 11/08/2022] Open
Abstract
Gene regulatory networks (GRNs) play key roles in development, phenotype plasticity, and evolution. Although graph theory has been used to explore GRNs, associations amongst topological features, transcription factors (TFs), and systems essentiality are poorly understood. Here we sought the relationship amongst the main GRN topological features that influence the control of essential and specific subsystems. We found that the Knn, page rank, and degree are the most relevant GRN features: the ones are conserved along the evolution and are also relevant in pluripotent cells. Interestingly, life-essential subsystems are governed mainly by TFs with intermediary Knn and high page rank or degree, whereas specialized subsystems are mainly regulated by TFs with low Knn. Hence, we suggest that the high probability of TFs be toured by a random signal, and the high probability of the signal propagation to target genes ensures the life-essential subsystems' robustness. Gene/genome duplication is the main evolutionary process to rise Knn as the most relevant feature. Herein, we shed light on unexplored topological GRN features to assess how they are related to subsystems and how the duplications shaped the regulatory systems along the evolution. The classification model generated can be found here: https://github.com/ivanrwolf/NoC/ .
Collapse
|
5
|
Mottes F, Villa C, Osella M, Caselle M. The impact of whole genome duplications on the human gene regulatory networks. PLoS Comput Biol 2021; 17:e1009638. [PMID: 34871317 PMCID: PMC8675932 DOI: 10.1371/journal.pcbi.1009638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 12/16/2021] [Accepted: 11/12/2021] [Indexed: 11/17/2022] Open
Abstract
This work studies the effects of the two rounds of Whole Genome Duplication (WGD) at the origin of the vertebrate lineage on the architecture of the human gene regulatory networks. We integrate information on transcriptional regulation, miRNA regulation, and protein-protein interactions to comparatively analyse the role of WGD and Small Scale Duplications (SSD) in the structural properties of the resulting multilayer network. We show that complex network motifs, such as combinations of feed-forward loops and bifan arrays, deriving from WGD events are specifically enriched in the network. Pairs of WGD-derived proteins display a strong tendency to interact both with each other and with common partners and WGD-derived transcription factors play a prominent role in the retention of a strong regulatory redundancy. Combinatorial regulation and synergy between different regulatory layers are in general enhanced by duplication events, but the two types of duplications contribute in different ways. Overall, our findings suggest that the two WGD events played a substantial role in increasing the multi-layer complexity of the vertebrate regulatory network by enhancing its combinatorial organization, with potential consequences on its overall robustness and ability to perform high-level functions like signal integration and noise control. Lastly, we discuss in detail the RAR/RXR pathway as an illustrative example of the evolutionary impact of WGD duplications in human. Gene duplication is one of the main mechanisms driving genome evolution. The duplication of a genomic segment can be the result of a local event, involving only a small portion of the genome, or of a dramatic duplication of the whole genome, which is however only rarely retained. All vertebrates descend from two rounds of Whole-Genome Duplication (WGD) that occurred approximately 500 Mya. We show that these events influenced in unique ways the evolution of different human gene regulatory networks, with sizeable effects on their current structure. We find that WGDs statistically increased the presence of specific classes of simple genetic circuits, considered to be fundamental building blocks of more sophisticated circuitry and commonly associated to complex functions. Our findings support the hypothesis that these rare, large-scale events have played a substantial role in the emergence of complex traits in vertebrates.
Collapse
Affiliation(s)
| | - Chiara Villa
- School of Mathematics and Statistics, University of St Andrews, Mathematical Institute, North Haugh, St Andrews, United Kingdom
| | - Matteo Osella
- Department of Physics, University of Turin & INFN, Turin, Italy
| | - Michele Caselle
- Department of Physics, University of Turin & INFN, Turin, Italy
| |
Collapse
|
6
|
Nulsen J, Misetic H, Yau C, Ciccarelli FD. Pan-cancer detection of driver genes at the single-patient resolution. Genome Med 2021; 13:12. [PMID: 33517897 PMCID: PMC7849133 DOI: 10.1186/s13073-021-00830-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 01/08/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Identifying the complete repertoire of genes that drive cancer in individual patients is crucial for precision oncology. Most established methods identify driver genes that are recurrently altered across patient cohorts. However, mapping these genes back to patients leaves a sizeable fraction with few or no drivers, hindering our understanding of cancer mechanisms and limiting the choice of therapeutic interventions. RESULTS We present sysSVM2, a machine learning software that integrates cancer genetic alterations with gene systems-level properties to predict drivers in individual patients. Using simulated pan-cancer data, we optimise sysSVM2 for application to any cancer type. We benchmark its performance on real cancer data and validate its applicability to a rare cancer type with few known driver genes. We show that drivers predicted by sysSVM2 have a low false-positive rate, are stable and disrupt well-known cancer-related pathways. CONCLUSIONS sysSVM2 can be used to identify driver alterations in patients lacking sufficient canonical drivers or belonging to rare cancer types for which assembling a large enough cohort is challenging, furthering the goals of precision oncology. As resources for the community, we provide the code to implement sysSVM2 and the pre-trained models in all TCGA cancer types ( https://github.com/ciccalab/sysSVM2 ).
Collapse
Affiliation(s)
- Joel Nulsen
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE1 1UL, UK
| | - Hrvoje Misetic
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE1 1UL, UK
| | - Christopher Yau
- School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK
- The Alan Turing Institute, London, NW1 2DB, UK
| | - Francesca D Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK.
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE1 1UL, UK.
| |
Collapse
|
7
|
Defoort J, Van de Peer Y, Carretero-Paulet L. The Evolution of Gene Duplicates in Angiosperms and the Impact of Protein-Protein Interactions and the Mechanism of Duplication. Genome Biol Evol 2020; 11:2292-2305. [PMID: 31364708 PMCID: PMC6735927 DOI: 10.1093/gbe/evz156] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/10/2019] [Indexed: 01/17/2023] Open
Abstract
Gene duplicates, generated through either whole genome duplication (WGD) or small-scale duplication (SSD), are prominent in angiosperms and are believed to play an important role in adaptation and in generating evolutionary novelty. Previous studies reported contrasting evolutionary and functional dynamics of duplicate genes depending on the mechanism of origin, a behavior that is hypothesized to stem from constraints to maintain the relative dosage balance between the genes concerned and their interaction context. However, the mechanisms ultimately influencing loss and retention of gene duplicates over evolutionary time are not yet fully elucidated. Here, by using a robust classification of gene duplicates in Arabidopsis thaliana, Solanum lycopersicum, and Zea mays, large RNAseq expression compendia and an extensive protein-protein interaction (PPI) network from Arabidopsis, we investigated the impact of PPIs on the differential evolutionary and functional fate of WGD and SSD duplicates. In all three species, retained WGD duplicates show stronger constraints to diverge at the sequence and expression level than SSD ones, a pattern that is also observed for shared PPI partners between Arabidopsis duplicates. PPIs are preferentially distributed among WGD duplicates and specific functional categories. Furthermore, duplicates with PPIs tend to be under stronger constraints to evolve than their counterparts without PPIs regardless of their mechanism of origin. Our results support dosage balance constraint as a specific property of genes involved in biological interactions, including physical PPIs, and suggest that additional factors may be differently influencing the evolution of genes following duplication, depending on the species, time, and mechanism of origin.
Collapse
Affiliation(s)
- Jonas Defoort
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Belgium.,Department of Biochemistry, Genetics and Microbiology, University of Pretoria, South Africa
| | - Lorenzo Carretero-Paulet
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium.,VIB Center for Plant Systems Biology, Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, Belgium
| |
Collapse
|
8
|
Abstract
Alterations in membrane proteins (MPs) and their regulated pathways have been established as cancer hallmarks and extensively targeted in clinical applications. However, the analysis of MP-interacting proteins and downstream pathways across human malignancies remains challenging. Here, we present a systematically integrated method to generate a resource of cancer membrane protein-regulated networks (CaMPNets), containing 63,746 high-confidence protein-protein interactions (PPIs) for 1962 MPs, using expression profiles from 5922 tumors with overall survival outcomes across 15 human cancers. Comprehensive analysis of CaMPNets links MP partner communities and regulated pathways to provide MP-based gene sets for identifying prognostic biomarkers and druggable targets. For example, we identify CHRNA9 with 12 PPIs (e.g., ERBB2) can be a therapeutic target and find its anti-metastasis agent, bupropion, for treatment in nicotine-induced breast cancer. This resource is a study to systematically integrate MP interactions, genomics, and clinical outcomes for helping illuminate cancer-wide atlas and prognostic landscapes in tumor homo/heterogeneity.
Collapse
|
9
|
Mourikis TP, Benedetti L, Foxall E, Temelkovski D, Nulsen J, Perner J, Cereda M, Lagergren J, Howell M, Yau C, Fitzgerald RC, Scaffidi P, Ciccarelli FD. Patient-specific cancer genes contribute to recurrently perturbed pathways and establish therapeutic vulnerabilities in esophageal adenocarcinoma. Nat Commun 2019; 10:3101. [PMID: 31308377 PMCID: PMC6629660 DOI: 10.1038/s41467-019-10898-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Accepted: 06/04/2019] [Indexed: 12/25/2022] Open
Abstract
The identification of cancer-promoting genetic alterations is challenging particularly in highly unstable and heterogeneous cancers, such as esophageal adenocarcinoma (EAC). Here we describe a machine learning algorithm to identify cancer genes in individual patients considering all types of damaging alterations simultaneously. Analysing 261 EACs from the OCCAMS Consortium, we discover helper genes that, alongside well-known drivers, promote cancer. We confirm the robustness of our approach in 107 additional EACs. Unlike recurrent alterations of known drivers, these cancer helper genes are rare or patient-specific. However, they converge towards perturbations of well-known cancer processes. Recurrence of the same process perturbations, rather than individual genes, divides EACs into six clusters differing in their molecular and clinical features. Experimentally mimicking the alterations of predicted helper genes in cancer and pre-cancer cells validates their contribution to disease progression, while reverting their alterations reveals EAC acquired dependencies that can be exploited in therapy.
Collapse
Affiliation(s)
- Thanos P Mourikis
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK
| | - Lorena Benedetti
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK
| | - Elizabeth Foxall
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK
| | - Damjan Temelkovski
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK
| | - Joel Nulsen
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK
| | - Juliane Perner
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, CB2 OXZ, UK
| | - Matteo Cereda
- Italian Institute for Genomic Medicine (IIGM), Turin, 10126, Italy
| | - Jesper Lagergren
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK
| | - Michael Howell
- High Throughput Screening Laboratory, The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
| | | | - Rebecca C Fitzgerald
- MRC Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge, CB2 OXZ, UK
| | - Paola Scaffidi
- Cancer Epigenetics Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
- UCL Cancer Institute, University College London, London, WC1E 6DD, UK
| | - Francesca D Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT, UK.
- School of Cancer and Pharmaceutical Sciences, King's College London, London, SE11UL, UK.
| |
Collapse
|
10
|
Repana D, Nulsen J, Dressler L, Bortolomeazzi M, Venkata SK, Tourna A, Yakovleva A, Palmieri T, Ciccarelli FD. The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol 2019; 20:1. [PMID: 30606230 PMCID: PMC6317252 DOI: 10.1186/s13059-018-1612-0] [Citation(s) in RCA: 358] [Impact Index Per Article: 71.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 12/12/2018] [Indexed: 02/06/2023] Open
Abstract
The Network of Cancer Genes (NCG) is a manually curated repository of 2372 genes whose somatic modifications have known or predicted cancer driver roles. These genes were collected from 275 publications, including two sources of known cancer genes and 273 cancer sequencing screens of more than 100 cancer types from 34,905 cancer donors and multiple primary sites. This represents a more than 1.5-fold content increase compared to the previous version. NCG also annotates properties of cancer genes, such as duplicability, evolutionary origin, RNA and protein expression, miRNA and protein interactions, and protein function and essentiality. NCG is accessible at http://ncg.kcl.ac.uk/ .
Collapse
Affiliation(s)
- Dimitra Repana
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Joel Nulsen
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Lisa Dressler
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Michele Bortolomeazzi
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Santhilata Kuppili Venkata
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Aikaterini Tourna
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Anna Yakovleva
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Tommaso Palmieri
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| | - Francesca D. Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, NW1 1AT UK
- School of Cancer and Pharmaceutical Sciences, King’s College London, London, SE1 1UL UK
| |
Collapse
|
11
|
Banerjee S, Feyertag F, Alvarez-Ponce D. Intrinsic protein disorder reduces small-scale gene duplicability. DNA Res 2017; 24:435-444. [PMID: 28430886 PMCID: PMC5737077 DOI: 10.1093/dnares/dsx015] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 03/28/2017] [Indexed: 01/23/2023] Open
Abstract
Whereas the rate of gene duplication is relatively high, only certain duplications survive the filter of natural selection and can contribute to genome evolution. However, the reasons why certain genes can be retained after duplication whereas others cannot remain largely unknown. Many proteins contain intrinsically disordered regions (IDRs), whose structures fluctuate between alternative conformational states. Due to their high flexibility, IDRs often enable protein–protein interactions and are the target of post-translational modifications. Intrinsically disordered proteins (IDPs) have characteristics that might either stimulate or hamper the retention of their encoding genes after duplication. On the one hand, IDRs may enable functional diversification, thus promoting duplicate retention. On the other hand, increased IDP availability is expected to result in deleterious unspecific interactions. Here, we interrogate the proteomes of human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana and Escherichia coli, in order to ascertain the impact of protein intrinsic disorder on gene duplicability. We show that, in general, proteins encoded by duplicated genes tend to be less disordered than those encoded by singletons. The only exception is proteins encoded by ohnologs, which tend to be more disordered than those encoded by singletons or genes resulting from small-scale duplications. Our results indicate that duplication of genes encoding IDPs outside the context of whole-genome duplication (WGD) is often deleterious, but that IDRs facilitate retention of duplicates in the context of WGD. We discuss the potential evolutionary implications of our results.
Collapse
Affiliation(s)
- Sanghita Banerjee
- Department of Biology, University of Nevada, Reno, NV 89557, USA.,Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India
| | - Felix Feyertag
- Department of Biology, University of Nevada, Reno, NV 89557, USA
| | | |
Collapse
|
12
|
Modelling the evolution of transcription factor binding preferences in complex eukaryotes. Sci Rep 2017; 7:7596. [PMID: 28790414 PMCID: PMC5548724 DOI: 10.1038/s41598-017-07761-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Accepted: 06/30/2017] [Indexed: 12/27/2022] Open
Abstract
Transcription factors (TFs) exert their regulatory action by binding to DNA with specific sequence preferences. However, different TFs can partially share their binding sequences due to their common evolutionary origin. This "redundancy" of binding defines a way of organizing TFs in "motif families" by grouping TFs with similar binding preferences. Since these ultimately define the TF target genes, the motif family organization entails information about the structure of transcriptional regulation as it has been shaped by evolution. Focusing on the human TF repertoire, we show that a one-parameter evolutionary model of the Birth-Death-Innovation type can explain the TF empirical repartition in motif families, and allows to highlight the relevant evolutionary forces at the origin of this organization. Moreover, the model allows to pinpoint few deviations from the neutral scenario it assumes: three over-expanded families (including HOX and FOX genes), a set of "singleton" TFs for which duplication seems to be selected against, and a higher-than-average rate of diversification of the binding preferences of TFs with a Zinc Finger DNA binding domain. Finally, a comparison of the TF motif family organization in different eukaryotic species suggests an increase of redundancy of binding with organism complexity.
Collapse
|
13
|
Extracting functional trends from whole genome duplication events using comparative genomics. Biol Proced Online 2016; 18:11. [PMID: 27168732 PMCID: PMC4862183 DOI: 10.1186/s12575-016-0041-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/24/2016] [Indexed: 01/06/2023] Open
Abstract
Background The number of species with completed genomes, including those with evidence for recent whole genome duplication events has exploded. The recently sequenced Atlantic salmon genome has been through two rounds of whole genome duplication since the divergence of teleost fish from the lineage that led to amniotes. This quadrupoling of the number of potential genes has led to complex patterns of retention and loss among gene families. Results Methods have been developed to characterize the interplay of duplicate gene retention processes across both whole genome duplication events and additional smaller scale duplication events. Further, gene expression divergence data has become available as well for Atlantic salmon and the closely related, pre-whole genome duplication pike and methods to describe expression divergence are also presented. These methods for the characterization of duplicate gene retention and gene expression divergence that have been applied to salmon are described. Conclusions With the growth in available genomic and functional data, the opportunities to extract functional inference from large scale duplicates using comparative methods have expanded dramatically. Recently developed methods that further this inference for duplicated genes have been described. Electronic supplementary material The online version of this article (doi:10.1186/s12575-016-0041-2) contains supplementary material, which is available to authorized users.
Collapse
|
14
|
Li Z, Defoort J, Tasdighian S, Maere S, Van de Peer Y, De Smet R. Gene Duplicability of Core Genes Is Highly Consistent across All Angiosperms. THE PLANT CELL 2016; 28:326-44. [PMID: 26744215 PMCID: PMC4790876 DOI: 10.1105/tpc.15.00877] [Citation(s) in RCA: 143] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 01/04/2016] [Indexed: 05/02/2023]
Abstract
Gene duplication is an important mechanism for adding to genomic novelty. Hence, which genes undergo duplication and are preserved following duplication is an important question. It has been observed that gene duplicability, or the ability of genes to be retained following duplication, is a nonrandom process, with certain genes being more amenable to survive duplication events than others. Primarily, gene essentiality and the type of duplication (small-scale versus large-scale) have been shown in different species to influence the (long-term) survival of novel genes. However, an overarching view of "gene duplicability" is lacking, mainly due to the fact that previous studies usually focused on individual species and did not account for the influence of genomic context and the time of duplication. Here, we present a large-scale study in which we investigated duplicate retention for 9178 gene families shared between 37 flowering plant species, referred to as angiosperm core gene families. For most gene families, we observe a strikingly consistent pattern of gene duplicability across species, with gene families being either primarily single-copy or multicopy in all species. An intermediate class contains gene families that are often retained in duplicate for periods extending to tens of millions of years after whole-genome duplication, but ultimately appear to be largely restored to singleton status, suggesting that these genes may be dosage balance sensitive. The distinction between single-copy and multicopy gene families is reflected in their functional annotation, with single-copy genes being mainly involved in the maintenance of genome stability and organelle function and multicopy genes in signaling, transport, and metabolism. The intermediate class was overrepresented in regulatory genes, further suggesting that these represent putative dosage-balance-sensitive genes.
Collapse
Affiliation(s)
- Zhen Li
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Jonas Defoort
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Setareh Tasdighian
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Steven Maere
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| | - Riet De Smet
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium Bioinformatics Institute Ghent, Ghent University, B-9052 Ghent, Belgium
| |
Collapse
|
15
|
An O, Dall'Olio GM, Mourikis TP, Ciccarelli FD. NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings. Nucleic Acids Res 2015; 44:D992-9. [PMID: 26516186 PMCID: PMC4702816 DOI: 10.1093/nar/gkv1123] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022] Open
Abstract
The Network of Cancer Genes (NCG, http://ncg.kcl.ac.uk/) is a manually curated repository of cancer genes derived from the scientific literature. Due to the increasing amount of cancer genomic data, we have introduced a more robust procedure to extract cancer genes from published cancer mutational screenings and two curators independently reviewed each publication. NCG release 5.0 (August 2015) collects 1571 cancer genes from 175 published studies that describe 188 mutational screenings of 13 315 cancer samples from 49 cancer types and 24 primary sites. In addition to collecting cancer genes, NCG also provides information on the experimental validation that supports the role of these genes in cancer and annotates their properties (duplicability, evolutionary origin, expression profile, function and interactions with proteins and miRNAs).
Collapse
Affiliation(s)
- Omer An
- Division of Cancer Studies, King's College London, London SE11UL, UK
| | | | - Thanos P Mourikis
- Division of Cancer Studies, King's College London, London SE11UL, UK
| | | |
Collapse
|
16
|
Module organization and variance in protein-protein interaction networks. Sci Rep 2015; 5:9386. [PMID: 25797237 PMCID: PMC4369690 DOI: 10.1038/srep09386] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Accepted: 03/03/2015] [Indexed: 12/13/2022] Open
Abstract
A module is a group of closely related proteins that act in concert to perform specific biological functions through protein–protein interactions (PPIs) that occur in time and space. However, the underlying module organization and variance remain unclear. In this study, we collected module templates to infer respective module families, including 58,041 homologous modules in 1,678 species, and PPI families using searches of complete genomic database. We then derived PPI evolution scores and interface evolution scores to describe the module elements, including core and ring components. Functions of core components were highly correlated with those of essential genes. In comparison with ring components, core proteins/PPIs were conserved across multiple species. Subsequently, protein/module variance of PPI networks confirmed that core components form dynamic network hubs and play key roles in various biological functions. Based on the analyses of gene essentiality, module variance, and gene co-expression, we summarize the observations of module organization and variance as follows: 1) a module consists of core and ring components; 2) core components perform major biological functions and collaborate with ring components to execute certain functions in some cases; 3) core components are more conserved and essential during organizational changes in different biological states or conditions.
Collapse
|
17
|
Acharya D, Mukherjee D, Podder S, Ghosh TC. Investigating different duplication pattern of essential genes in mouse and human. PLoS One 2015; 10:e0120784. [PMID: 25751152 PMCID: PMC4353620 DOI: 10.1371/journal.pone.0120784] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 01/27/2015] [Indexed: 11/18/2022] Open
Abstract
Gene duplication is one of the major driving forces shaping genome and organism evolution and thought to be itself regulated by some intrinsic properties of the gene. Comparing the essential genes among mouse and human, we observed that the essential genes avoid duplication in mouse while prefer to remain duplicated in humans. In this study, we wanted to explore the reasons behind such differences in gene essentiality by cross-species comparison of human and mouse. Moreover, we examined essential genes that are duplicated in humans are functionally more redundant than that in mouse. The proportion of paralog pseudogenization of essential genes is higher in mouse than that of humans. These duplicates of essential genes are under stringent dosage regulation in human than in mouse. We also observed slower evolutionary rate in the paralogs of human essential genes than the mouse counterpart. Together, these results clearly indicate that human essential genes are retained as duplicates to serve as backed up copies that may shield themselves from harmful mutations.
Collapse
Affiliation(s)
- Debarun Acharya
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | - Dola Mukherjee
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | - Soumita Podder
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | - Tapash C. Ghosh
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
18
|
Garcia-Alonso L, Jiménez-Almazán J, Carbonell-Caballero J, Vela-Boza A, Santoyo-López J, Antiñolo G, Dopazo J. The role of the interactome in the maintenance of deleterious variability in human populations. Mol Syst Biol 2014; 10:752. [PMID: 25261458 PMCID: PMC4299661 DOI: 10.15252/msb.20145222] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Revised: 08/23/2014] [Accepted: 08/28/2014] [Indexed: 12/25/2022] Open
Abstract
Recent genomic projects have revealed the existence of an unexpectedly large amount of deleterious variability in the human genome. Several hypotheses have been proposed to explain such an apparently high mutational load. However, the mechanisms by which deleterious mutations in some genes cause a pathological effect but are apparently innocuous in other genes remain largely unknown. This study searched for deleterious variants in the 1,000 genomes populations, as well as in a newly sequenced population of 252 healthy Spanish individuals. In addition, variants causative of monogenic diseases and somatic variants from 41 chronic lymphocytic leukaemia patients were analysed. The deleterious variants found were analysed in the context of the interactome to understand the role of network topology in the maintenance of the observed mutational load. Our results suggest that one of the mechanisms whereby the effect of these deleterious variants on the phenotype is suppressed could be related to the configuration of the protein interaction network. Most of the deleterious variants observed in healthy individuals are concentrated in peripheral regions of the interactome, in combinations that preserve their connectivity, and have a marginal effect on interactome integrity. On the contrary, likely pathogenic cancer somatic deleterious variants tend to occur in internal regions of the interactome, often with associated structural consequences. Finally, variants causative of monogenic diseases seem to occupy an intermediate position. Our observations suggest that the real pathological potential of a variant might be more a systems property rather than an intrinsic property of individual proteins.
Collapse
Affiliation(s)
- Luz Garcia-Alonso
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Jorge Jiménez-Almazán
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain
| | - Jose Carbonell-Caballero
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Alicia Vela-Boza
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain
| | - Javier Santoyo-López
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain
| | - Guillermo Antiñolo
- Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain Department of Genetics, Reproduction and Fetal Medicine, Institute of Biomedicine of Seville, University Hospital Virgen del Rocio/Consejo Superior de Investigaciones Científicas/University of Seville, Seville, Spain Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Seville, Spain
| | - Joaquin Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain Bioinformatics of Rare Diseases (BIER), CIBER de Enfermedades Raras (CIBERER), Valencia, Spain Medical Genome Project, Genomics and Bioinformatics Platform of Andalusia (GBPA), Seville, Spain Functional Genomics Node, (INB) at CIPF, Valencia, Spain
| |
Collapse
|
19
|
Guo Z, Jiang W, Lages N, Borcherds W, Wang D. Relationship between gene duplicability and diversifiability in the topology of biochemical networks. BMC Genomics 2014; 15:577. [PMID: 25005725 PMCID: PMC4129122 DOI: 10.1186/1471-2164-15-577] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 06/26/2014] [Indexed: 01/21/2023] Open
Abstract
Background Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes. Results Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene’s duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes – the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family. Conclusion Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks – an improvement of our understanding of gene duplicability.
Collapse
Affiliation(s)
| | | | | | | | - Degeng Wang
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA.
| |
Collapse
|
20
|
An O, Pendino V, D'Antonio M, Ratti E, Gentilini M, Ciccarelli FD. NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau015. [PMID: 24608173 PMCID: PMC3948431 DOI: 10.1093/database/bau015] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
NCG 4.0 is the latest update of the Network of Cancer Genes, a web-based repository of systems-level properties of cancer genes. In its current version, the database collects information on 537 known (i.e. experimentally supported) and 1463 candidate (i.e. inferred using statistical methods) cancer genes. Candidate cancer genes derive from the manual revision of 67 original publications describing the mutational screening of 3460 human exomes and genomes in 23 different cancer types. For all 2000 cancer genes, duplicability, evolutionary origin, expression, functional annotation, interaction network with other human proteins and with microRNAs are reported. In addition to providing a substantial update of cancer-related information, NCG 4.0 also introduces two new features. The first is the annotation of possible false-positive cancer drivers, defined as candidate cancer genes inferred from large-scale screenings whose association with cancer is likely to be spurious. The second is the description of the systems-level properties of 64 human microRNAs that are causally involved in cancer progression (oncomiRs). Owing to the manual revision of all information, NCG 4.0 constitutes a complete and reliable resource on human coding and non-coding genes whose deregulation drives cancer onset and/or progression. NCG 4.0 can also be downloaded as a free application for Android smart phones. Database URL: http://bio.ieo.eu/ncg/.
Collapse
Affiliation(s)
- Omer An
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy and Division of Cancer Studies, King's College London, London SE1 1UL, UK
| | | | | | | | | | | |
Collapse
|
21
|
Differential evolutionary constraints in the evolution of chemoreceptors: a murine and human case study. ScientificWorldJournal 2014; 2014:696485. [PMID: 24587745 PMCID: PMC3920627 DOI: 10.1155/2014/696485] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 10/23/2013] [Indexed: 02/08/2023] Open
Abstract
Chemoreception is among the most important sensory modalities in animals. Organisms use the ability to perceive chemical compounds in all major ecological activities. Recent studies have allowed the characterization of chemoreceptor gene families. These genes present strikingly high variability in copy numbers and pseudogenization degrees among different species, but the mechanisms underlying their evolution are not fully understood. We have analyzed the functional networks of these genes, their orthologs distribution, and performed phylogenetic analyses in order to investigate their evolutionary dynamics. We have modeled the chemosensory networks and compared the evolutionary constraints of their genes in Mus musculus, Homo sapiens, and Rattus norvegicus. We have observed significant differences regarding the constraints on the orthologous groups and network topologies of chemoreceptors and signal transduction machinery. Our findings suggest that chemosensory receptor genes are less constrained than their signal transducing machinery, resulting in greater receptor diversity and conservation of information processing pathways. More importantly, we have observed significant differences among the receptors themselves, suggesting that olfactory and bitter taste receptors are more conserved than vomeronasal receptors.
Collapse
|
22
|
Emmert-Streib F, Zhang SD, Hamilton P. Dry computational approaches for wet medical problems. J Transl Med 2014; 12:26. [PMID: 24460894 PMCID: PMC3905162 DOI: 10.1186/1479-5876-12-26] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Accepted: 01/23/2014] [Indexed: 11/10/2022] Open
Abstract
This is a report on the 4th international conference in 'Quantitative Biology and Bioinformatics in Modern Medicine' held in Belfast (UK), 19-20 September 2013. The aim of the conference was to bring together leading experts from a variety of different areas that are key for Systems Medicine to exchange novel findings and promote interdisciplinary ideas and collaborations.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Lisburn Road 97, Belfast, UK.
| | | | | |
Collapse
|
23
|
Zahiri J, Bozorgmehr JH, Masoudi-Nejad A. Computational Prediction of Protein-Protein Interaction Networks: Algo-rithms and Resources. Curr Genomics 2014; 14:397-414. [PMID: 24396273 PMCID: PMC3861891 DOI: 10.2174/1389202911314060004] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 08/07/2013] [Accepted: 08/26/2013] [Indexed: 01/15/2023] Open
Abstract
Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability.
Collapse
Affiliation(s)
- Javad Zahiri
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| | - Joseph Hannon Bozorgmehr
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Iran
| |
Collapse
|
24
|
D'Antonio M, Guerra RF, Cereda M, Marchesi S, Montani F, Nicassio F, Di Fiore PP, Ciccarelli FD. Recessive cancer genes engage in negative genetic interactions with their functional paralogs. Cell Rep 2013; 5:1519-26. [PMID: 24360954 DOI: 10.1016/j.celrep.2013.11.033] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Revised: 09/25/2013] [Accepted: 11/18/2013] [Indexed: 01/01/2023] Open
Abstract
Cancer genetic heterogeneity offers a wide repertoire of molecular determinants to be screened as therapeutic targets. Here, we identify potential anticancer targets by exploiting negative genetic interactions between genes with driver loss-of-function mutations (recessive cancer genes) and their functionally redundant paralogs. We identify recessive genes with additional copies and experimentally test our predictions on three paralogous pairs. We confirm digenic negative interactions between two cancer genes (SMARCA4 and CDH1) and their corresponding paralogs (SMARCA2 and CDH3). Furthermore, we identify a trigenic negative interaction between the cancer gene DNMT3A, its functional paralog DNMT3B, and a third gene, DNMT1, which encodes the only other human DNA-methylase domain. Although our study does not exclude other causes of synthetic lethality, it suggests that functionally redundant paralogs of cancer genes could be targets in anticancer therapy.
Collapse
Affiliation(s)
- Matteo D'Antonio
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| | - Rosalinda F Guerra
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| | - Matteo Cereda
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| | - Stefano Marchesi
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| | - Francesca Montani
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| | - Francesco Nicassio
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy; Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia, 20139 Milan, Italy
| | - Pier Paolo Di Fiore
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy; IFOM, Fondazione Istituto FIRC di Oncologia Molecolare, Via Adamello 16, 20139 Milan, Italy; Dipartimento di Scienze della Salute, Università degli Studi di Milano, Via di Rudinì 8, 20122 Milan, Italy
| | - Francesca D Ciccarelli
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy; Division of Cancer Studies, King's College London, London SE1 1UL, UK.
| |
Collapse
|
25
|
D'Antonio M, Ciccarelli FD. Integrated analysis of recurrent properties of cancer genes to identify novel drivers. Genome Biol 2013; 14:R52. [PMID: 23718799 PMCID: PMC4054099 DOI: 10.1186/gb-2013-14-5-r52] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Accepted: 05/29/2013] [Indexed: 11/10/2022] Open
Abstract
The heterogeneity of cancer genomes in terms of acquired mutations complicates the identification of genes whose modification may exert a driver role in tumorigenesis. In this study, we present a novel method that integrates expression profiles, mutation effects, and systemic properties of mutated genes to identify novel cancer drivers. We applied our method to ovarian cancer samples and were able to identify putative drivers in the majority of carcinomas without mutations in known cancer genes, thus suggesting that it can be used as a complementary approach to find rare driver mutations that cannot be detected using frequency-based approaches.
Collapse
|
26
|
Alvarez-Ponce D, Fares MA. Evolutionary rate and duplicability in the Arabidopsis thaliana protein-protein interaction network. Genome Biol Evol 2013; 4:1263-74. [PMID: 23160177 PMCID: PMC3542556 DOI: 10.1093/gbe/evs101] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genes show a bewildering variation in their patterns of molecular evolution, as a result of the action of different levels and types of selective forces. The factors underlying this variation are, however, still poorly understood. In the last decade, the position of proteins in the protein-protein interaction network has been put forward as a determinant factor of the evolutionary rate and duplicability of their encoding genes. This conclusion, however, has been based on the analysis of the limited number of microbes and animals for which interactome-level data are available (essentially, Escherichia coli, yeast, worm, fly, and humans). Here, we study, for the first time, the relationship between the position of proteins in the high-density interactome of a plant (Arabidopsis thaliana) and the patterns of molecular evolution of their encoding genes. We found that genes whose encoded products act at the center of the network are more evolutionarily constrained than those acting at the network periphery. This trend remains significant when potential confounding factors (gene expression level and breadth, duplicability, function, and length of the encoded products) are controlled for. Even though the correlation between centrality measures and rates of evolution is generally weak, for some functional categories, it is comparable in strength to (or even stronger than) the correlation between evolutionary rates and expression levels or breadths. In addition, genes encoding interacting proteins in the network evolve at relatively similar rates. Finally, Arabidopsis proteins encoded by duplicated genes are more highly connected than those encoded by singleton genes. This observation is in agreement with the patterns observed in humans, but in contrast with those observed in E. coli, yeast, worm, and fly (whose duplicated genes tend to act at the periphery of the network), implying that the relationship between duplicability and centrality inverted at least twice during eukaryote evolution. Taken together, these results indicate that the structure of the A. thaliana network constrains the evolution of its components at multiple levels.
Collapse
Affiliation(s)
- David Alvarez-Ponce
- Department of Abiotic Stress, Integrative and Systems Biology Laboratory, Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicias (CSIC-UPV), Valencia, Spain.
| | | |
Collapse
|
27
|
Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc Natl Acad Sci U S A 2013; 110:2898-903. [PMID: 23382190 DOI: 10.1073/pnas.1300127110] [Citation(s) in RCA: 245] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The importance of gene gain through duplication has long been appreciated. In contrast, the importance of gene loss has only recently attracted attention. Indeed, studies in organisms ranging from plants to worms and humans suggest that duplication of some genes might be better tolerated than that of others. Here we have undertaken a large-scale study to investigate the existence of duplication-resistant genes in the sequenced genomes of 20 flowering plants. We demonstrate that there is a large set of genes that is convergently restored to single-copy status following multiple genome-wide and smaller scale duplication events. We rule out the possibility that such a pattern could be explained by random gene loss only and therefore propose that there is selection pressure to preserve such genes as singletons. This is further substantiated by the observation that angiosperm single-copy genes do not comprise a random fraction of the genome, but instead are often involved in essential housekeeping functions that are highly conserved across all eukaryotes. Furthermore, single-copy genes are generally expressed more highly and in more tissues than non-single-copy genes, and they exhibit higher sequence conservation. Finally, we propose different hypotheses to explain their resistance against duplication.
Collapse
|
28
|
Liu BA, Nash PD. Evolution of SH2 domains and phosphotyrosine signalling networks. Philos Trans R Soc Lond B Biol Sci 2012; 367:2556-73. [PMID: 22889907 DOI: 10.1098/rstb.2012.0107] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Src homology 2 (SH2) domains mediate selective protein-protein interactions with tyrosine phosphorylated proteins, and in doing so define specificity of phosphotyrosine (pTyr) signalling networks. SH2 domains and protein-tyrosine phosphatases expand alongside protein-tyrosine kinases (PTKs) to coordinate cellular and organismal complexity in the evolution of the unikont branch of the eukaryotes. Examination of conserved families of PTKs and SH2 domain proteins provides fiduciary marks that trace the evolutionary landscape for the development of complex cellular systems in the proto-metazoan and metazoan lineages. The evolutionary provenance of conserved SH2 and PTK families reveals the mechanisms by which diversity is achieved through adaptations in tissue-specific gene transcription, altered ligand binding, insertions of linear motifs and the gain or loss of domains following gene duplication. We discuss mechanisms by which pTyr-mediated signalling networks evolve through the development of novel and expanded families of SH2 domain proteins and the elaboration of connections between pTyr-signalling proteins. These changes underlie the variety of general and specific signalling networks that give rise to tissue-specific functions and increasingly complex developmental programmes. Examination of SH2 domains from an evolutionary perspective provides insight into the process by which evolutionary expansion and modification of molecular protein interaction domain proteins permits the development of novel protein-interaction networks and accommodates adaptation of signalling networks.
Collapse
Affiliation(s)
- Bernard A Liu
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | | |
Collapse
|
29
|
Zhu Y, Du P, Nakhleh L. Gene duplicability-connectivity-complexity across organisms and a neutral evolutionary explanation. PLoS One 2012; 7:e44491. [PMID: 22984517 PMCID: PMC3439388 DOI: 10.1371/journal.pone.0044491] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 08/02/2012] [Indexed: 02/02/2023] Open
Abstract
Gene duplication has long been acknowledged by biologists as a major evolutionary force shaping genomic architectures and characteristics across the Tree of Life. Major research has been conducting on elucidating the fate of duplicated genes in a variety of organisms, as well as factors that affect a gene’s duplicability–that is, the tendency of certain genes to retain more duplicates than others. In particular, two studies have looked at the correlation between gene duplicability and its degree in a protein-protein interaction network in yeast, mouse, and human, and another has looked at the correlation between gene duplicability and its complexity (length, number of domains, etc.) in yeast. In this paper, we extend these studies to six species, and two trends emerge. There is an increase in the duplicability-connectivity correlation that agrees with the increase in the genome size as well as the phylogenetic relationship of the species. Further, the duplicability-complexity correlation seems to be constant across the species. We argue that the observed correlations can be explained by neutral evolutionary forces acting on the genomic regions containing the genes. For the duplicability-connectivity correlation, we show through simulations that an increasing trend can be obtained by adjusting parameters to approximate genomic characteristics of the respective species. Our results call for more research into factors, adaptive and non-adaptive alike, that determine a gene’s duplicability.
Collapse
Affiliation(s)
- Yun Zhu
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- * E-mail: (YZ); (LN)
| | - Peng Du
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- * E-mail: (YZ); (LN)
| |
Collapse
|
30
|
A network synthesis model for generating protein interaction network families. PLoS One 2012; 7:e41474. [PMID: 22912671 PMCID: PMC3418285 DOI: 10.1371/journal.pone.0041474] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 06/27/2012] [Indexed: 11/19/2022] Open
Abstract
In this work, we introduce a novel network synthesis model that can generate families of evolutionarily related synthetic protein-protein interaction (PPI) networks. Given an ancestral network, the proposed model generates the network family according to a hypothetical phylogenetic tree, where the descendant networks are obtained through duplication and divergence of their ancestors, followed by network growth using network evolution models. We demonstrate that this network synthesis model can effectively create synthetic networks whose internal and cross-network properties closely resemble those of real PPI networks. The proposed model can serve as an effective framework for generating comprehensive benchmark datasets that can be used for reliable performance assessment of comparative network analysis algorithms. Using this model, we constructed a large-scale network alignment benchmark, called NAPAbench, and evaluated the performance of several representative network alignment algorithms. Our analysis clearly shows the relative performance of the leading network algorithms, with their respective advantages and disadvantages. The algorithm and source code of the network synthesis model and the network alignment benchmark NAPAbench are publicly available at http://www.ece.tamu.edu/bjyoon/NAPAbench/.
Collapse
|
31
|
The ecology of bacterial genes and the survival of the new. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2012; 2012:394026. [PMID: 22900231 PMCID: PMC3415099 DOI: 10.1155/2012/394026] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/21/2012] [Accepted: 06/26/2012] [Indexed: 11/18/2022]
Abstract
Much of the observed variation among closely related bacterial genomes is attributable to gains and losses of genes that are acquired horizontally as well as to gene duplications and larger amplifications. The genomic flexibility that results from these mechanisms certainly contributes to the ability of bacteria to survive and adapt in varying environmental challenges. However, the duplicability and transferability of individual genes imply that natural selection should operate, not only at the organismal level, but also at the level of the gene. Genes can be considered semiautonomous entities that possess specific functional niches and evolutionary dynamics. The evolution of bacterial genes should respond both to selective pressures that favor competition, mostly among orthologs or paralogs that may occupy the same functional niches, and cooperation, with the majority of other genes coexisting in a given genome. The relative importance of either type of selection is likely to vary among different types of genes, based on the functional niches they cover and on the tightness of their association with specific organismal lineages. The frequent availability of new functional niches caused by environmental changes and biotic evolution should enable the constant diversification of gene families and the survival of new lineages of genes.
Collapse
|
32
|
Doherty A, Alvarez-Ponce D, McInerney JO. Increased genome sampling reveals a dynamic relationship between gene duplicability and the structure of the primate protein-protein interaction network. Mol Biol Evol 2012; 29:3563-73. [PMID: 22723304 DOI: 10.1093/molbev/mss165] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Although gene duplications occur at a higher rate, only a small fraction of these are retained. The position of a gene's encoded product in the protein-protein interaction network has recently emerged as a determining factor of gene duplicability. However, the direction of the relationship between network centrality and duplicability is not universal: In Escherichia coli, yeast, fly, and worm, duplicated genes more often act at the periphery of the network, whereas in humans, such genes tend to occupy the most central positions. Herein, we have inferred duplication events that took place in the different branches of the primate phylogeny. In agreement with previous observations, we found that duplications generally affected the most central network genes, which is presumably the process that has most influenced the trend in humans. However, the opposite trend--that is, duplication being more common in genes whose encoded products are peripheral in the network--is observed for three recent branches, including, quite counterintuitively, the external branch leading to humans. This indicates a shift in the relationship between centrality and duplicability during primate evolution. Furthermore, we found that genes encoding interacting proteins exhibit phylogenetic tree topologies that are more similar than expected for random pairs and that genes duplicated in a given branch of the phylogeny tend to interact with those that duplicated in the same lineage. These results indicate that duplication of a gene increases the likelihood of duplication of its interacting partners. Our observations indicate that the structure of the primate protein-protein interaction network affects gene duplicability in previously unrecognized ways.
Collapse
Affiliation(s)
- Aoife Doherty
- Department of Biology, National University of Ireland Maynooth, Maynooth, County Kildare, Ireland
| | | | | |
Collapse
|
33
|
Thiergart T, Landan G, Schenk M, Dagan T, Martin WF. An evolutionary network of genes present in the eukaryote common ancestor polls genomes on eukaryotic and mitochondrial origin. Genome Biol Evol 2012; 4:466-85. [PMID: 22355196 PMCID: PMC3342870 DOI: 10.1093/gbe/evs018] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
To test the predictions of competing and mutually exclusive hypotheses for the origin of eukaryotes, we identified from a sample of 27 sequenced eukaryotic and 994 sequenced prokaryotic genomes 571 genes that were present in the eukaryote common ancestor and that have homologues among eubacterial and archaebacterial genomes. Maximum-likelihood trees identified the prokaryotic genomes that most frequently contained genes branching as the sister to the eukaryotic nuclear homologues. Among the archaebacteria, euryarchaeote genomes most frequently harbored the sister to the eukaryotic nuclear gene, whereas among eubacteria, the α-proteobacteria were most frequently represented within the sister group. Only 3 genes out of 571 gave a 3-domain tree. Homologues from α-proteobacterial genomes that branched as the sister to nuclear genes were found more frequently in genomes of facultatively anaerobic members of the rhiozobiales and rhodospirilliales than in obligate intracellular ricketttsial parasites. Following α-proteobacteria, the most frequent eubacterial sister lineages were γ-proteobacteria, δ-proteobacteria, and firmicutes, which were also the prokaryote genomes least frequently found as monophyletic groups in our trees. Although all 22 higher prokaryotic taxa sampled (crenarchaeotes, γ-proteobacteria, spirochaetes, chlamydias, etc.) harbor genes that branch as the sister to homologues present in the eukaryotic common ancestor, that is not evidence of 22 different prokaryotic cells participating at eukaryote origins because prokaryotic “lineages” have laterally acquired genes for more than 1.5 billion years since eukaryote origins. The data underscore the archaebacterial (host) nature of the eukaryotic informational genes and the eubacterial (mitochondrial) nature of eukaryotic energy metabolism. The network linking genes of the eukaryote ancestor to contemporary homologues distributed across prokaryotic genomes elucidates eukaryote gene origins in a dialect cognizant of gene transfer in nature.
Collapse
Affiliation(s)
- Thorsten Thiergart
- Institute of Molecular Evolution, Heinrich-Heine University Düsseldorf, Germany
| | | | | | | | | |
Collapse
|
34
|
D'Antonio M, Pendino V, Sinha S, Ciccarelli FD. Network of Cancer Genes (NCG 3.0): integration and analysis of genetic and network properties of cancer genes. Nucleic Acids Res 2012; 40:D978-83. [PMID: 22080562 PMCID: PMC3245144 DOI: 10.1093/nar/gkr952] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 10/12/2011] [Indexed: 12/22/2022] Open
Abstract
The identification of a constantly increasing number of genes whose mutations are causally implicated in tumor initiation and progression (cancer genes) requires the development of tools to store and analyze them. The Network of Cancer Genes (NCG 3.0) collects information on 1494 cancer genes that have been found mutated in 16 different cancer types. These genes were collected from the Cancer Gene Census as well as from 18 whole exome and 11 whole-genome screenings of cancer samples. For each cancer gene, NCG 3.0 provides a summary of the gene features and the cross-reference to other databases. In addition, it describes duplicability, evolutionary origin, orthology, network properties, interaction partners, microRNA regulation and functional roles of cancer genes and of all genes that are related to them. This integrated network of information can be used to better characterize cancer genes in the context of the system in which they act. The data can also be used to identify novel candidates that share the same properties of known cancer genes and may therefore play a similar role in cancer. NCG 3.0 is freely available at http://bio.ifom-ieo-campus.it/ncg.
Collapse
Affiliation(s)
| | | | | | - Francesca D. Ciccarelli
- Department of Experimental Oncology, European Institute of Oncology, IFOM-IEO Campus, Via Adamello 16, 20139 Milan, Italy
| |
Collapse
|
35
|
Konrad A, Teufel AI, Grahnen JA, Liberles DA. Toward a general model for the evolutionary dynamics of gene duplicates. Genome Biol Evol 2011; 3:1197-209. [PMID: 21920903 PMCID: PMC3205605 DOI: 10.1093/gbe/evr093] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Gene duplication is an important process in the functional divergence of genes and genomes. Several processes have been described that lead to duplicate gene retention over different timescales after both smaller-scale events and whole-genome duplication, including neofunctionalization, subfunctionalization, and dosage balance. Two common modes of duplicate gene loss include nonfunctionalization and loss due to population dynamics (failed fixation). Previous work has characterized expectations of duplicate gene retention under the neofunctionalization and subfunctionalization models. Here, that work is extended to dosage balance using simulations. A general model for duplicate gene loss/retention is then presented that is capable of fitting expectations under the different models, is defined at t = 0, and decays to an orthologous asymptotic rate rather than zero, based upon a modified Weibull hazard function. The model in a maximum likelihood framework shows the property of identifiability, recovering the evolutionary mechanism and parameters of simulation. This model is also capable of recovering the evolutionary mechanism of simulation from data generated using an unrelated network population genetic model. Lastly, the general model is applied as part of a mixture model to recent gene duplicates from the Oikopleura dioica genome, suggesting that neofunctionalization may be an important process leading to duplicate gene retention in that organism.
Collapse
Affiliation(s)
- Anke Konrad
- Department of Molecular Biology, University of Wyoming
| | | | | | | |
Collapse
|