1
|
Parallel evolution despite low genetic diversity in three-spined sticklebacks. Proc Biol Sci 2024; 291:20232617. [PMID: 38593844 PMCID: PMC11003780 DOI: 10.1098/rspb.2023.2617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/08/2024] [Indexed: 04/11/2024] Open
Abstract
When populations repeatedly adapt to similar environments they can evolve similar phenotypes based on shared genetic mechanisms (parallel evolution). The likelihood of parallel evolution is affected by demographic history, as it depends on the standing genetic variation of the source population. The three-spined stickleback (Gasterosteus aculeatus) repeatedly colonized and adapted to brackish and freshwater. Most parallel evolution studies in G. aculeatus were conducted at high latitudes, where freshwater populations maintain connectivity to the source marine populations. Here, we analysed southern and northern European marine and freshwater populations to test two hypotheses. First, that southern European freshwater populations (which currently lack connection to marine populations) lost genetic diversity due to bottlenecks and inbreeding compared to their northern counterparts. Second, that the degree of genetic parallelism is higher among northern than southern European freshwater populations, as the latter have been subjected to strong drift due to isolation. The results show that southern populations exhibit lower genetic diversity but a higher degree of genetic parallelism than northern populations. Hence, they confirm the hypothesis that southern populations have lost genetic diversity, but this loss probably happened after they had already adapted to freshwater conditions, explaining the high degree of genetic parallelism in the south.
Collapse
|
2
|
Abstract
Transcription factors (TFs) play a pivotal role as regulators of gene expression, orchestrating the formation and maintenance of diverse animal body plans and innovations. However, the precise contributions of TFs and the underlying mechanisms driving the origin of basal metazoan body plans, particularly in ctenophores, remain elusive. Here, we present a comprehensive catalog of TFs in 2 ctenophore species, Pleurobrachia bachei and Mnemiopsis leidyi, revealing 428 and 418 TFs in their respective genomes. In contrast, morphologically simpler metazoans have a reduced TF representation compared to ctenophores, cnidarians, and bilaterians: the sponge Amphimedon encodes 277 TFs, and the placozoan Trichoplax adhaerens encodes 274 TFs. The emergence of complex ctenophore tissues and organs coincides with significant lineage-specific diversification of the zinc finger C2H2 (ZF-C2H2) and homeobox superfamilies of TFs. Notable, the lineages leading to Amphimedon and Trichoplax exhibit independent expansions of leucine zipper (BZIP) TFs. Some lineage-specific TFs may have evolved through the domestication of mobile elements, thereby supporting alternative mechanisms of parallel TF evolution and body plan diversification across the Metazoa.
Collapse
|
3
|
GENESIS: Gene-Specific Machine Learning Models for Variants of Uncertain Significance Found in Catecholaminergic Polymorphic Ventricular Tachycardia and Long QT Syndrome-Associated Genes. Circ Arrhythm Electrophysiol 2022; 15:e010326. [PMID: 35357185 PMCID: PMC9018586 DOI: 10.1161/circep.121.010326] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
BACKGROUND Cardiac channelopathies such as catecholaminergic polymorphic tachycardia and long QT syndrome predispose patients to fatal arrhythmias and sudden cardiac death. As genetic testing has become common in clinical practice, variants of uncertain significance (VUS) in genes associated with catecholaminergic polymorphic ventricular tachycardia and long QT syndrome are frequently found. The objective of this study was to predict pathogenicity of catecholaminergic polymorphic ventricular tachycardia-associated RYR2 VUS and long QT syndrome-associated VUS in KCNQ1, KCNH2, and SCN5A by developing gene-specific machine learning models and assessing them using cross-validation, cellular electrophysiological data, and clinical correlation. METHODS The GENe-specific EnSemble grId Search framework was developed to identify high-performing machine learning models for RYR2, KCNQ1, KCNH2, and SCN5A using variant- and protein-specific inputs. Final models were applied to datasets of VUS identified from ClinVar and exome sequencing. Whole cell patch clamp and clinical correlation of selected VUS was performed. RESULTS The GENe-specific EnSemble grId Search models outperformed alternative methods, with area under the receiver operating characteristics up to 0.87, average precisions up to 0.83, and calibration slopes as close to 1.0 (perfect) as 1.04. Blinded voltage-clamp analysis of HEK293T cells expressing 2 predicted pathogenic variants in KCNQ1 each revealed an ≈80% reduction of peak Kv7.1 current compared with WT. Normal Kv7.1 function was observed in KCNQ1-V241I HEK cells as predicted. Though predicted benign, loss of Kv7.1 function was observed for KCNQ1-V106D HEK cells. Clinical correlation of 9/10 variants supported model predictions. CONCLUSIONS Gene-specific machine learning models may have a role in post-genetic testing diagnostic analyses by providing high performance prediction of variant pathogenicity.
Collapse
|
4
|
Current status of PTMs structural databases: applications, limitations and prospects. Amino Acids 2022; 54:575-590. [PMID: 35020020 DOI: 10.1007/s00726-021-03119-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 12/11/2022]
Abstract
Protein 3D structures, determined by their amino acid sequences, are the support of major crucial biological functions. Post-translational modifications (PTMs) play an essential role in regulating these functions by altering the physicochemical properties of proteins. By virtue of their importance, several PTM databases have been developed and released in decades, but very few of these databases incorporate real 3D structural data. Since PTMs influence the function of the protein and their aberrant states are frequently implicated in human diseases, providing structural insights to understand the influence and dynamics of PTMs is crucial for unraveling the underlying processes. This review is dedicated to the current status of databases providing 3D structural data on PTM sites in proteins. Some of these databases are general, covering multiple types of PTMs in different organisms, while others are specific to one particular type of PTM, class of proteins or organism. The importance of these databases is illustrated with two major types of in silico applications: predicting PTM sites in proteins using machine learning approaches and investigating protein structure-function relationships involving PTMs. Finally, these databases suffer from multiple problems and care must be taken when analyzing the PTMs data.
Collapse
|
5
|
Expression, Interaction, and Role of Pseudogene Adh6-ps1 in Cancer Phenotypes. Bioinform Biol Insights 2021; 15:11779322211040591. [PMID: 34413637 PMCID: PMC8369952 DOI: 10.1177/11779322211040591] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 07/26/2021] [Indexed: 01/15/2023] Open
Abstract
Pseudogenes have been classified as functionless and their annotation is an ongoing problem. The Adh6-ps1-a mouse pseudogene belonging to the alcohol dehydrogenase gene complex (Adh) was analyzed to review the conservation, homology, expression, and interactions and identify any role it plays in disease phenotypes using bioinformatics databases. Results showed that Adh6-ps1 have 2 transcripts (processed and unprocessed) which may have emerged from a transposition and duplication event, respectively, and that induced inversions (Uox gene, In(3)11Rk) involving gene complexes associated with Adh6-ps1 have been implicated in a diverse range of diseases. Adh6-ps1 is highly conserved in vertebrates particularly rodents and expressed in the liver. The top 5 MirRNA targets were Mir455, Mir511, Mir1903, Mir361, and Mir669o markers. While much is unknown about Mir1903 and Mir669o, the silencing of Mir455 and Mir511 is linked with hepatocellular carcinoma (HCC), and Mir361 is implicated in endometrial cancers. Given the identified MirRNA interactions with Adh6-ps1 and its expression in HCC and reproductive systems, it may well have a role in tumorigenesis and disease phenotypes. Nonetheless, further studies are required to establish these facts to add to the growing efforts to understand pseudogenes and their potential involvement in disease conditions.
Collapse
|
6
|
Chromosome Genome Assembly of the Leopard Coral Grouper ( Plectropomus leopardus) With Nanopore and Hi-C Sequencing Data. Front Genet 2020; 11:876. [PMID: 32983227 PMCID: PMC7492660 DOI: 10.3389/fgene.2020.00876] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 07/17/2020] [Indexed: 11/13/2022] Open
|
7
|
On the causes of geographically heterogeneous parallel evolution in sticklebacks. Nat Ecol Evol 2020; 4:1105-1115. [DOI: 10.1038/s41559-020-1222-6] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 05/14/2020] [Indexed: 12/22/2022]
|
8
|
The Grass Carp Genomic Visualization Database (GCGVD): an informational platform for genome biology of grass carp. Int J Biol Sci 2019; 15:2119-2127. [PMID: 31592084 PMCID: PMC6775296 DOI: 10.7150/ijbs.32860] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Accepted: 05/27/2019] [Indexed: 11/25/2022] Open
Abstract
With the release of the draft genome of the grass carp, researches on the grass carp from the genetic level and the further molecular mechanisms of economically valuable physiological behaviors have gained great attention. In this paper, we integrated a large number of genomic, genetic and some other data resources and established a web-based grass carp genomic visualization database (GCGVD). To view these data more effectively, we visualized grass carp and zebrafish gene collinearity and genetic linkage map using Scalable Vector Graphics (SVG) format in the browser, and genomic annotations by JBrowse. Furthermore, we carried out some preliminary study on a whole-genome alternative splicing (AS)of the grass carp. The RNA-seq reads of 15 samples were aligned to the reference genome of the grass carp by Bowtie2 software. RNA-seq reads of each sample and density map of reads were also exhibited in JBrowse. Additionally, we designed a universal grass carp genome annotation data model to improve the retrieval speed and scalability. Compared with the published database GCGD previously, we newly added the visualization of some more genomic annotations, conserved domain and RNA-seq reads aligned to the reference genome. GCGVD can be accessed at http://122.112.216.104.
Collapse
|
9
|
Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation. J Vis Exp 2019. [PMID: 30735170 DOI: 10.3791/58907] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Advancements in the cost and speed of next generation genetic sequencing have generated an explosion of clinical whole exome and whole genome testing. While this has led to increased identification of likely pathogenic mutations associated with genetic syndromes, it has also dramatically increased the number of incidentally found genetic variants of unknown significance (VUS). Determining the clinical significance of these variants is a major challenge for both scientists and clinicians. An approach to assist in determining the likelihood of pathogenicity is signal-to-noise analysis at the protein sequence level. This protocol describes a method for amino acid-level signal-to-noise analysis that leverages variant frequency at each amino acid position of the protein with known protein topology to identify areas of the primary sequence with elevated likelihood of pathologic variation (relative to population "background" variation). This method can identify amino acid residue location "hotspots" of high pathologic signal, which can be used to refine the diagnostic weight of VUSs such as those identified by next generation genetic testing.
Collapse
|
10
|
Worldwide phylogeny of three-spined sticklebacks. Mol Phylogenet Evol 2018; 127:613-625. [DOI: 10.1016/j.ympev.2018.06.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 04/16/2018] [Accepted: 06/04/2018] [Indexed: 11/23/2022]
|
11
|
Interpreting Incidentally Identified Variants in Genes Associated With Catecholaminergic Polymorphic Ventricular Tachycardia in a Large Cohort of Clinical Whole-Exome Genetic Test Referrals. Circ Arrhythm Electrophysiol 2017; 10:CIRCEP.116.004742. [PMID: 28404607 DOI: 10.1161/circep.116.004742] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 02/15/2017] [Indexed: 11/16/2022]
Abstract
BACKGROUND The rapid expansion of genetic testing has led to increased utilization of clinical whole-exome sequencing (WES). Clinicians and genetic researchers are being faced with assessing risk of disease vulnerability from incidentally identified genetic variants which is typified by variants found in genes associated with sudden death-predisposing catecholaminergic polymorphic ventricular tachycardia (CPVT). We sought to determine whether incidentally identified variants in genes associated with CPVT from WES clinical testing represent disease-associated biomarkers. METHODS AND RESULTS CPVT-associated genes RYR2 and CASQ2 variants were identified in one of the world's largest collections of clinical WES referral tests (N=6517, Baylor Miraca Genetics Laboratories) and compared with a control cohort of ostensibly healthy individuals (N=60 706) and a case cohort of CPVT cases (N=155). Within the WES cohort, the rate of rare variants in CPVT-associated genes was 8.8% compared with 6.0% among controls and 60.0% among cases. There was a predominance of variants of undetermined significance (97.7%). After protein topology mapping, WES variants colocalized more frequently to residues with variants found in controls compared with cases. Retrospective clinical evaluation of individuals referred to our institution with WES-positive variants demonstrated no evidence of clinical CPVT in individuals with a low pretest clinical suspicion for CPVT. CONCLUSIONS The prevalence of incidentally identified CPVT-associated variants is ≈9% among WES tests. Variants of undetermined significances in CPVT-associated genes in WES genetic testing, in the absence of clinical suspicion for CPVT, are unlikely to represent markers of CPVT pathogenicity.
Collapse
|
12
|
Normalized long read RNA sequencing in chicken reveals transcriptome complexity similar to human. BMC Genomics 2017; 18:323. [PMID: 28438136 PMCID: PMC5404281 DOI: 10.1186/s12864-017-3691-9] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 04/06/2017] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Despite the significance of chicken as a model organism, our understanding of the chicken transcriptome is limited compared to human. This issue is common to all non-human vertebrate annotations due to the difficulty in transcript identification from short read RNAseq data. While previous studies have used single molecule long read sequencing for transcript discovery, they did not perform RNA normalization and 5'-cap selection which may have resulted in lower transcriptome coverage and truncated transcript sequences. RESULTS We sequenced normalised chicken brain and embryo RNA libraries with Pacific Bioscience Iso-Seq. 5' cap selection was performed on the embryo library to provide methodological comparison. From these Iso-Seq sequencing projects, we have identified 60 k transcripts and 29 k genes within the chicken transcriptome. Of these, more than 20 k are novel lncRNA transcripts with ~3 k classified as sense exonic overlapping lncRNA, which is a class that is underrepresented in many vertebrate annotations. The relative proportion of alternative transcription events revealed striking similarities between the chicken and human transcriptomes while also providing explanations for previously observed genomic differences. CONCLUSIONS Our results indicate that the chicken transcriptome is similar in complexity compared to human, and provide insights into other vertebrate biology. Our methodology demonstrates the potential of Iso-Seq sequencing to rapidly expand our knowledge of transcriptomics.
Collapse
|
13
|
Role of the LF-SINE-Derived Distal ISL1 Enhancer in Patients with Classic Bladder Exstrophy. J Pediatr Genet 2017; 6:169-173. [PMID: 28794909 DOI: 10.1055/s-0037-1602387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/20/2017] [Indexed: 10/19/2022]
Abstract
A genome-wide association study and meta-analysis identified ISL1 as the first genome-wide significant susceptibility gene for classic bladder exstrophy (CBE). A short interspersed repetitive element (SINE), first detected in lobe-finned fishes (LF-SINE), was shown to drive Isl1 expression in embryonic mouse genital eminence. Hence, we assumed this enhancer a conclusive target for mutations associated with CBE formation and analyzed a cohort of 200 CBE patients. Although we identified two enhancer variants in five CBE patients, their clinical significance seems unlikely, implying that sequence variants in the ISL1 LF-SINE enhancer are not frequently associated with CBE.
Collapse
|
14
|
Interpreting Incidentally Identified Variants in Genes Associated With Catecholaminergic Polymorphic Ventricular Tachycardia in a Large Cohort of Clinical Whole-Exome Genetic Test Referrals. CIRCULATION. ARRHYTHMIA AND ELECTROPHYSIOLOGY 2017. [PMID: 28404607 DOI: 10.1161/circep.116.004742.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND The rapid expansion of genetic testing has led to increased utilization of clinical whole-exome sequencing (WES). Clinicians and genetic researchers are being faced with assessing risk of disease vulnerability from incidentally identified genetic variants which is typified by variants found in genes associated with sudden death-predisposing catecholaminergic polymorphic ventricular tachycardia (CPVT). We sought to determine whether incidentally identified variants in genes associated with CPVT from WES clinical testing represent disease-associated biomarkers. METHODS AND RESULTS CPVT-associated genes RYR2 and CASQ2 variants were identified in one of the world's largest collections of clinical WES referral tests (N=6517, Baylor Miraca Genetics Laboratories) and compared with a control cohort of ostensibly healthy individuals (N=60 706) and a case cohort of CPVT cases (N=155). Within the WES cohort, the rate of rare variants in CPVT-associated genes was 8.8% compared with 6.0% among controls and 60.0% among cases. There was a predominance of variants of undetermined significance (97.7%). After protein topology mapping, WES variants colocalized more frequently to residues with variants found in controls compared with cases. Retrospective clinical evaluation of individuals referred to our institution with WES-positive variants demonstrated no evidence of clinical CPVT in individuals with a low pretest clinical suspicion for CPVT. CONCLUSIONS The prevalence of incidentally identified CPVT-associated variants is ≈9% among WES tests. Variants of undetermined significances in CPVT-associated genes in WES genetic testing, in the absence of clinical suspicion for CPVT, are unlikely to represent markers of CPVT pathogenicity.
Collapse
|
15
|
Abstract
Protein domain identification and analysis are cornerstones of modern proteomics. The tools available to protein domain researchers avail a variety of approaches to understanding large protein domain families. Hidden Markov Models (HMM) form the basis for identifying and categorizing evolutionarily linked protein domains. Here I describe the use of HMM models for predicting and identifying Src Homology 2 (SH2) domains within the proteome.
Collapse
|
16
|
Elevated Gene Copy Number Does Not Always Explain Elevated Amylase Activities in Fishes. Physiol Biochem Zool 2016; 89:277-93. [PMID: 27327179 DOI: 10.1086/687288] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Amylase activity variation in the guts of several model organisms appears to be explained by amylase gene copy number variation. We tested the hypothesis that amylase gene copy number is always elevated in animals with high amylolytic activity. We therefore sequenced the amylase genes and examined amylase gene copy number in prickleback fishes (family Stichaeidae) with different diets including two species of convergently evolved herbivores with the elevated amylase activity phenotype. We found elevated amylase gene copy number (six haploid copies) with sequence variation among copies in one herbivore (Cebidichthys violaceus) and modest gene copy number (two to three haploid copies) with little sequence variation in the remaining taxa, which included herbivores, omnivores, and a carnivore. Few functional differences in amylase biochemistry were observed, and previous investigations showed similar digestibility among the convergently evolved herbivores with differing amylase genetics. Hence, the phenotype of elevated amylase activity can be achieved by different mechanisms (i.e., elevated expression of fewer genes, increased gene copy number, or expression of more efficient amylase proteins) with similar results. Phylogenetic and comparative genomic analyses of available fish amylase genes show mostly lineage-specific duplication events leading to gene copy number variation, although a whole-genome duplication event or chromosomal translocation may have produced multiple amylase copies in the Ostariophysi, again showing multiple routes to the same result.
Collapse
|
17
|
Population genomic evidence for adaptive differentiation in the Baltic Sea herring. Mol Ecol 2016; 25:2833-52. [DOI: 10.1111/mec.13657] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 03/31/2016] [Accepted: 04/14/2016] [Indexed: 01/30/2023]
|
18
|
Nuclear hormone receptor DHR96 mediates the resistance to xenobiotics but not the increased lifespan of insulin-mutant Drosophila. Proc Natl Acad Sci U S A 2016; 113:1321-6. [PMID: 26787908 PMCID: PMC4747718 DOI: 10.1073/pnas.1515137113] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Lifespan of laboratory animals can be increased by genetic, pharmacological, and dietary interventions. Increased expression of genes involved in xenobiotic metabolism, together with resistance to xenobiotics, are frequent correlates of lifespan extension in the nematode worm Caenorhabditis elegans, the fruit fly Drosophila, and mice. The Green Theory of Aging suggests that this association is causal, with the ability of cells to rid themselves of lipophilic toxins limiting normal lifespan. To test this idea, we experimentally increased resistance of Drosophila to the xenobiotic dichlordiphenyltrichlorethan (DDT), by artificial selection or by transgenic expression of a gene encoding a cytochrome P450. Although both interventions increased DDT resistance, neither increased lifespan. Furthermore, dietary restriction increased lifespan without increasing xenobiotic resistance, confirming that the two traits can be uncoupled. Reduced activity of the insulin/Igf signaling (IIS) pathway increases resistance to xenobiotics and extends lifespan in Drosophila, and can also increase longevity in C. elegans, mice, and possibly humans. We identified a nuclear hormone receptor, DHR96, as an essential mediator of the increased xenobiotic resistance of IIS mutant flies. However, the IIS mutants remained long-lived in the absence of DHR96 and the xenobiotic resistance that it conferred. Thus, in Drosophila IIS mutants, increased xenobiotic resistance and enhanced longevity are not causally connected. The frequent co-occurrence of the two traits may instead have evolved because, in nature, lowered IIS can signal the presence of pathogens. It will be important to determine whether enhanced xenobiotic metabolism is also a correlated, rather than a causal, trait in long-lived mice.
Collapse
|
19
|
Gonadal transcriptomics elucidate patterns of adaptive evolution within marine rockfishes (Sebastes). BMC Genomics 2015; 16:656. [PMID: 26329285 PMCID: PMC4557894 DOI: 10.1186/s12864-015-1870-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Accepted: 08/20/2015] [Indexed: 12/12/2022] Open
Abstract
Background The genetic mechanisms of speciation and adaptation in the marine environment are not well understood. The rockfish genus Sebastes provides a unique model system for studying adaptive evolution because of the extensive diversity found within this group, which includes morphology, ecology, and a broad range of life spans. Examples of adaptive radiations within marine ecosystems are considered an anomaly due to the absence of geographical barriers and the presence of gene flow. Using marine rockfishes, we identified signatures of natural selection from transcriptomes developed from gonadal tissue of two rockfish species (Sebastes goodei and S. saxicola). We predicted orthologous transcript pairs, and estimated their distributions of nonsynonymous (Ka) and synonymous (Ks) substitution rates. Results We identified 144 genes out of 1079 orthologous pairs under positive selection, of which 11 are functionally annotated to reproduction based on gene ontologies (GOs). One orthologous pair of the zona pellucida gene family, which is known for its role in the selection of sperm by oocytes, out of ten was identified to be evolving under positive selection. In addition to our results in the protein coding-regions of transcripts, we found substitution rates in 3’ and 5’ UTRs to be significantly lower than Ks substitution rates implying negative selection in these regions. Conclusions We were able to identify a series of candidate genes that are useful for the assessment of the critical genes that diverged and are responsible for the radiation within this genus. Genes associated with longevity hold potential for understanding the molecular mechanisms that have contributed to the radiation within this genus. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1870-0) contains supplementary material, which is available to authorized users.
Collapse
|
20
|
Analysis methods for studying the 3D architecture of the genome. Genome Biol 2015; 16:183. [PMID: 26328929 PMCID: PMC4556012 DOI: 10.1186/s13059-015-0745-7] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 08/10/2015] [Indexed: 11/10/2022] Open
Abstract
The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome. In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets. Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.
Collapse
|
21
|
Comparison of predicted and actual consequences of missense mutations. Proc Natl Acad Sci U S A 2015; 112:E5189-98. [PMID: 26269570 DOI: 10.1073/pnas.1511585112] [Citation(s) in RCA: 154] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Each person's genome sequence has thousands of missense variants. Practical interpretation of their functional significance must rely on computational inferences in the absence of exhaustive experimental measurements. Here we analyzed the efficacy of these inferences in 33 de novo missense mutations revealed by sequencing in first-generation progeny of N-ethyl-N-nitrosourea-treated mice, involving 23 essential immune system genes. PolyPhen2, SIFT, MutationAssessor, Panther, CADD, and Condel were used to predict each mutation's functional importance, whereas the actual effect was measured by breeding and testing homozygotes for the expected in vivo loss-of-function phenotype. Only 20% of mutations predicted to be deleterious by PolyPhen2 (and 15% by CADD) showed a discernible phenotype in individual homozygotes. Half of all possible missense mutations in the same 23 immune genes were predicted to be deleterious, and most of these appear to become subject to purifying selection because few persist between separate mouse substrains, rodents, or primates. Because defects in immune genes could be phenotypically masked in vivo by compensation and environment, we compared inferences by the same tools with the in vitro phenotype of all 2,314 possible missense variants in TP53; 42% of mutations predicted by PolyPhen2 to be deleterious (and 45% by CADD) had little measurable consequence for TP53-promoted transcription. We conclude that for de novo or low-frequency missense mutations found by genome sequencing, half those inferred as deleterious correspond to nearly neutral mutations that have little impact on the clinical phenotype of individual cases but will nevertheless become subject to purifying selection.
Collapse
|
22
|
Amplification of microsatellite repeat motifs is associated with the evolutionary differentiation and heterochromatinization of sex chromosomes in Sauropsida. Chromosoma 2015; 125:111-23. [DOI: 10.1007/s00412-015-0531-z] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Revised: 07/01/2015] [Accepted: 07/03/2015] [Indexed: 01/05/2023]
|
23
|
Venus trap in the mouse embryo reveals distinct molecular dynamics underlying specification of first embryonic lineages. EMBO Rep 2015; 16:1005-21. [PMID: 26142281 DOI: 10.15252/embr.201540162] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 06/02/2015] [Indexed: 12/31/2022] Open
Abstract
Mammalian development begins with the segregation of embryonic and extra-embryonic lineages in the blastocyst. Recent studies revealed cell-to-cell gene expression heterogeneity and dynamic cell rearrangements during mouse blastocyst formation. Thus, mechanistic understanding of lineage specification requires quantitative description of gene expression dynamics at a single-cell resolution in living embryos. However, only a few fluorescent gene expression reporter mice are available and quantitative live image analysis is limited so far. Here, we carried out a fluorescence gene-trap screen and established reporter mice expressing Venus specifically in the first lineages. Lineage tracking, quantitative gene expression and cell position analyses allowed us to build a comprehensive lineage map of mouse pre-implantation development. Our systematic analysis revealed that, contrary to the available models, the timing and mechanism of lineage specification may be distinct between the trophectoderm and the inner cell mass. While expression of our trophectoderm-specific lineage marker is upregulated in outside cells upon asymmetric divisions at 8- and 16-cell stages, the inside-specific upregulation of the inner-cell-mass marker only becomes evident at the 64-cell stage. This study thus provides a framework toward systems-level understanding of embryogenesis marked by high dynamicity and stochastic variability.
Collapse
|
24
|
Spatial complexity of character-based writing systems and arithmetic in primary school: a longitudinal study. Front Psychol 2015; 6:333. [PMID: 25859235 PMCID: PMC4374393 DOI: 10.3389/fpsyg.2015.00333] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 03/08/2015] [Indexed: 11/18/2022] Open
Abstract
Previous research has consistently found an association between spatial and mathematical abilities. We hypothesized that this link may partially explain the consistently observed advantage in mathematics demonstrated by East Asian children. Spatial complexity of the character-based writing systems may reflect or lead to a cognitive advantage relevant to mathematics. Seven hundered and twenty one 6–9-year old children from the UK and Russia were assessed on a battery of cognitive skills and arithmetic. The Russian children were recruited from specialist linguistic schools and divided into four different language groups, based on the second language they were learning (i.e., English, Spanish, Chinese, and Japanese). The UK children attended regular schools and were not learning any second language. The testing took place twice across the school year, once at the beginning, before the start of the second language acquisition, and once at the end of the year. The study had two aims: (1) to test whether spatial ability predicts mathematical ability in 7–9 year-old children across the samples; (2) to test whether acquisition and usage of a character-based writing system leads to an advantage in performance in arithmetic and related cognitive tasks. The longitudinal link from spatial ability to mathematics was found only in the Russian sample. The effect of second language acquisition on mathematics or other cognitive skills was negligible, although some effect of Chinese language on mathematical reasoning was suggested. Overall, the findings suggest that although spatial ability is related to mathematics at this age, one academic year of exposure to spatially complex writing systems is not enough to provide a mathematical advantage. Other educational and socio-cultural factors might play a greater role in explaining individual and cross-cultural differences in arithmetic at this age.
Collapse
|
25
|
New Sicydiinae phylogeny (Teleostei: Gobioidei) inferred from mitochondrial and nuclear genes: Insights on systematics and ancestral areas. Mol Phylogenet Evol 2014; 70:260-71. [DOI: 10.1016/j.ympev.2013.09.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 09/20/2013] [Accepted: 09/27/2013] [Indexed: 10/26/2022]
|
26
|
Evolution of the Cation Chloride Cotransporter Family: Ancient Origins, Gene Losses, and Subfunctionalization through Duplication. Mol Biol Evol 2013; 31:434-47. [DOI: 10.1093/molbev/mst225] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
27
|
Genomic divergence between nine- and three-spined sticklebacks. BMC Genomics 2013; 14:756. [PMID: 24188282 PMCID: PMC4046692 DOI: 10.1186/1471-2164-14-756] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 10/31/2013] [Indexed: 12/22/2022] Open
Abstract
Background Comparative genomics approaches help to shed light on evolutionary processes that shape differentiation between lineages. The nine-spined stickleback (Pungitius pungitius) is a closely related species of the ecological ‘supermodel’ three-spined stickleback (Gasterosteus aculeatus). It is an emerging model system for evolutionary biology research but has garnered less attention and lacks extensive genomic resources. To expand on these resources and aid the study of sticklebacks in a phylogenetic framework, we characterized nine-spined stickleback transcriptomes from brain and liver using deep sequencing. Results We obtained nearly eight thousand assembled transcripts, of which 3,091 were assigned as putative one-to-one orthologs to genes found in the three-spined stickleback. These sequences were used for evaluating overall differentiation and substitution rates between nine- and three-spined sticklebacks, and to identify genes that are putatively evolving under positive selection. The synonymous substitution rate was estimated to be 7.1 × 10-9 per site per year between the two species, and a total of 165 genes showed patterns of adaptive evolution in one or both species. A few nine-spined stickleback contigs lacked an obvious ortholog in three-spined sticklebacks but were found to match genes in other fish species, suggesting several gene losses within 13 million years since the divergence of the two stickleback species. We identified 47 SNPs in 25 different genes that differentiate pond and marine ecotypes. We also identified 468 microsatellites that could be further developed as genetic markers in nine-spined sticklebacks. Conclusion With deep sequencing of nine-spined stickleback cDNA libraries, our study provides a significant increase in the number of gene sequences and microsatellite markers for this species, and identifies a number of genes showing patterns of adaptive evolution between nine- and three-spined sticklebacks. We also report several candidate genes that might be involved in differential adaptation between marine and freshwater nine-spined sticklebacks. This study provides a valuable resource for future studies aiming to identify candidate genes underlying ecological adaptation in this and other stickleback species. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-14-756) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
Gene conversions are under purifying selection in the carcinoembryonic antigen immunoglobulin gene families of primates. Genomics 2013; 102:301-9. [DOI: 10.1016/j.ygeno.2013.07.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Revised: 06/25/2013] [Accepted: 07/08/2013] [Indexed: 11/20/2022]
|
29
|
Abstract
Heterogeneity among life traits in mammals has resulted in considerable phylogenetic conflict, particularly concerning the position of the placental root. Layered upon this are gene- and lineage-specific variation in amino acid substitution rates and compositional biases. Life trait variations that may impact upon mutational rates are longevity, metabolic rate, body size, and germ line generation time. Over the past 12 years, three main conflicting hypotheses have emerged for the placement of the placental root. These hypotheses place the Atlantogenata (common ancestor of Xenarthra plus Afrotheria), the Afrotheria, or the Xenarthra as the sister group to all other placental mammals. Model adequacy is critical for accurate tree reconstruction and by failing to account for these compositional and character exchange heterogeneities across the tree and data set, previous studies have not provided a strongly supported hypothesis for the placental root. For the first time, models that accommodate both tree and data set heterogeneity have been applied to mammal data. Here, we show the impact of accurate model assignment and the importance of data sets in accommodating model parameters while maintaining the power to reject competing hypotheses. Through these sophisticated methods, we demonstrate the importance of model adequacy, data set power and provide strong support for the Atlantogenata over other competing hypotheses for the position of the placental root.
Collapse
|
30
|
Transcriptomics of morphological color change in polychromatic Midas cichlids. BMC Genomics 2013; 14:171. [PMID: 23497064 PMCID: PMC3623868 DOI: 10.1186/1471-2164-14-171] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2012] [Accepted: 03/06/2013] [Indexed: 12/30/2022] Open
Abstract
Background Animal pigmentation has received much attention in evolutionary biology research due to its strong implications for adaptation and speciation. However, apart from a few cases the genetic changes associated with these evolutionary processes remain largely unknown. The Midas cichlid fish from Central America are an ideal model system for investigating pigmentation traits that may also play a role in speciation. Most Midas cichlids maintain their melanophores and exhibit a grayish (normal) color pattern throughout their lives. A minority of individuals, however, undergo color change and exhibit a distinctive gold or even white coloration in adulthood. The ontogenetic color change in the Midas cichlids may also shed light on the molecular mechanisms underlying pigmentation disorders in humans. Results Here we use next-generation sequencing (Illumina) RNAseq analyses to compare skin transcriptome-wide expression levels in three distinct stages of color transformation in Midas cichlids. cDNA libraries of scale tissue, for six biological replicates of each group, were generated and sequenced using Illumina technology. Using a combination of three differential expression (DE) analyses we identified 46 candidate genes that showed DE between the color morphs. We find evidence for two key DE patterns: a) genes involved in melanosomal pathways are up-regulated in normally pigmented fish; and b) immediate early and inflammatory response genes were up-regulated in transitional fish, a response that parallels some human skin disorders such as melanoma formation and psoriasis. One of the DE genes segregates with the gold phenotype in a genetic cross and might be associated with incipient speciation in this highly “species-rich” lineage of cichlids. Conclusions Using transcriptomic analyses we successfully identified key expression differences between different color morphs of Midas cichlid fish. These differentially expressed genes have important implications for our understanding of the molecular mechanisms underlying speciation in this lineage of extremely young species since they mate strongly assortatively, and new species may arise by sexual selection due to this color polymorphism. Some of the human orthologues of the genes identified here may also be involved in pigmentation differences and diseases and therefore provide genetic markers for the detection of human pigmentation disorders.
Collapse
|
31
|
Intra-genomic GC heterogeneity in sauropsids: evolutionary insights from cDNA mapping and GC(3) profiling in snake. BMC Genomics 2012; 13:604. [PMID: 23140509 PMCID: PMC3549455 DOI: 10.1186/1471-2164-13-604] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2012] [Accepted: 10/24/2012] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Extant sauropsids (reptiles and birds) are divided into two major lineages, the lineage of Testudines (turtles) and Archosauria (crocodilians and birds) and the lineage of Lepidosauria (tuatara, lizards, worm lizards and snakes). Karyotypes of these sauropsidan groups generally consist of macrochromosomes and microchromosomes. In chicken, microchromosomes exhibit a higher GC-content than macrochromosomes. To examine the pattern of intra-genomic GC heterogeneity in lepidosaurian genomes, we constructed a cytogenetic map of the Japanese four-striped rat snake (Elaphe quadrivirgata) with 183 cDNA clones by fluorescence in situ hybridization, and examined the correlation between the GC-content of exonic third codon positions (GC3) of the genes and the size of chromosomes on which the genes were localized. RESULTS Although GC3 distribution of snake genes was relatively homogeneous compared with those of the other amniotes, microchromosomal genes showed significantly higher GC3 than macrochromosomal genes as in chicken. Our snake cytogenetic map also identified several conserved segments between the snake macrochromosomes and the chicken microchromosomes. Cross-species comparisons revealed that GC3 of most snake orthologs in such macrochromosomal segments were GC-poor (GC3 < 50%) whereas those of chicken orthologs in microchromosomes were relatively GC-rich (GC3 ≥ 50%). CONCLUSION Our results suggest that the chromosome size-dependent GC heterogeneity had already occurred before the lepidosaur-archosaur split, 275 million years ago. This character was probably present in the common ancestor of lepidosaurs and but lost in the lineage leading to Anolis during the diversification of lepidosaurs. We also identified several genes whose GC-content might have been influenced by the size of the chromosomes on which they were harbored over the course of sauropsid evolution.
Collapse
|
32
|
Epitope-specificity of recombinant antibodies reveals promiscuous peptide-binding properties. Protein Sci 2012; 21:1897-910. [PMID: 23034898 DOI: 10.1002/pro.2173] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 09/26/2012] [Indexed: 01/25/2023]
Abstract
Protein-peptide interactions are a common occurrence and essential for numerous cellular processes, and frequently explored in broad applications within biology, medicine, and proteomics. Therefore, understanding the molecular mechanism(s) of protein-peptide recognition, specificity, and binding interactions will be essential. In this study, we report the first detailed analysis of antibody-peptide interaction characteristics, by combining large-scale experimental peptide binding data with the structural analysis of eight human recombinant antibodies and numerous peptides, targeting tryptic mammalian and eukaryote proteomes. The results consistently revealed that promiscuous peptide-binding interactions, that is, both specific and degenerate binding, were exhibited by all antibodies, and the discovery was corroborated by orthogonal data, indicating that this might be a general phenomenon for low-affinity antibody-peptide interactions. The molecular mechanism for the degenerate peptide-binding specificity appeared to be executed through the use of 2-3 semi-conserved anchor residues in the C-terminal part of the peptides, in analogue to the mechanism utilized by the major histocompatibility complex-peptide complexes. In the long-term, this knowledge will be instrumental for advancing our fundamental understanding of protein-peptide interactions, as well as for designing, generating, and applying peptide specific antibodies, or peptide-binding proteins in general, in various biotechnical and medical applications.
Collapse
|
33
|
Parsing parallel evolution: ecological divergence and differential gene expression in the adaptive radiations of thick-lipped Midas cichlid fishes from Nicaragua. Mol Ecol 2012; 22:650-69. [DOI: 10.1111/mec.12034] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 06/11/2012] [Accepted: 07/26/2012] [Indexed: 01/31/2023]
|
34
|
Inter-tissue networks between the basal forebrain, hippocampus, and prefrontal cortex in a model for depression caused by disturbed sleep. J Neurogenet 2012; 26:397-412. [PMID: 22783900 DOI: 10.3109/01677063.2012.694932] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Disturbances in sleep are encountered in the majority of patients with depressive disorder. To elucidate the molecular mechanisms behind this relationship, we examined gene expression changes in a rodent model for disturbed sleep and depression. The animals were treated with daily injections of clomipramine to affect their sleep during early infancy. This early interference with sleep is known to induce depression-like behavior in adult animals. After 2 weeks of treatment, the change in gene expression was examined using the Affymetrix Rat 230.2 chip. We studied the gene expression in the basal forebrain, hippocampus, and frontal cortex and combined the results to reveal the otherwise indissectible networks between and around the tissues. The major disrupted pathways between the three brain areas were related to synaptic transmission, regulation of translation, and ubiquitinylation. The involved pathways were within the cellular components of the axons, growth cones, melanosomes, and pigment granules. A network analysis allowing for additional interactors, in the form of chemicals or gene products, revealed a disturbed communicational network between the different brain areas. This disturbed network is centered around serotonin, Mn(II), and Rhoa. The findings elucidate inter-tissue pathways and networks in the brain that are involved in sleep and mood regulation. The findings are of uttermost interest, some are quite predictable and obvious, but some are novel or have only been proposed by rare theoretical speculations (such as the melanosome and Mn(II) involvement). Equally important as the findings are the methods described in this article. In this study, we present two novel simple ways to perform system biological analysis based on gene expression array data. We used two already existing tools in a new way, and by careful planning of the input data, managed to extrapolate intricate hidden inter-tissue networks to build a molecular picture of disease.
Collapse
|
35
|
Colon cancer associated genes exhibit signatures of positive selection at functionally significant positions. BMC Evol Biol 2012; 12:114. [PMID: 22788692 PMCID: PMC3563467 DOI: 10.1186/1471-2148-12-114] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 06/22/2012] [Indexed: 12/17/2022] Open
Abstract
Background Cancer, much like most human disease, is routinely studied by utilizing model organisms. Of these model organisms, mice are often dominant. However, our assumptions of functional equivalence fail to consider the opportunity for divergence conferred by ~180 Million Years (MY) of independent evolution between these species. For a given set of human disease related genes, it is therefore important to determine if functional equivalency has been retained between species. In this study we test the hypothesis that cancer associated genes have different patterns of substitution akin to adaptive evolution in different mammal lineages. Results Our analysis of the current literature and colon cancer databases identified 22 genes exhibiting colon cancer associated germline mutations. We identified orthologs for these 22 genes across a set of high coverage (>6X) vertebrate genomes. Analysis of these orthologous datasets revealed significant levels of positive selection. Evidence of lineage-specific positive selection was identified in 14 genes in both ancestral and extant lineages. Lineage-specific positive selection was detected in the ancestral Euarchontoglires and Hominidae lineages for STK11, in the ancestral primate lineage for CDH1, in the ancestral Murinae lineage for both SDHC and MSH6 genes and the ancestral Muridae lineage for TSC1. Conclusion Identifying positive selection in the Primate, Hominidae, Muridae and Murinae lineages suggests an ancestral functional shift in these genes between the rodent and primate lineages. Analyses such as this, combining evolutionary theory and predictions - along with medically relevant data, can thus provide us with important clues for modeling human diseases.
Collapse
|
36
|
New model of cystic fibrosis transmembrane conductance regulator proposes active channel-like conformation. J Chem Inf Model 2012; 52:1842-53. [PMID: 22747419 DOI: 10.1021/ci2005884] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The cystic fibrosis transmembrane conductance regulator (CFTR) is an unusual ABC transporter, functioning as a chloride channel critical for fluid homeostasis in multiple organs. Disruption of CFTR function is associated with cystic fibrosis making it an attractive therapeutic target. In addition, CFTR blockers are being developed as potential antidiarrheals. CFTR drug discovery is hampered by the lack of high resolution structural data, and considerable efforts have been invested in modeling the channel structure. Although previously published CFTR models that have been made publicly available mostly agree with experimental data relating to the overall structure, they present the channel in an outward-facing conformation that does not agree with expected properties of a "channel-like" structure. Here, we make available a model of CFTR in such a "channel-like" conformation, derived by a unique modeling approach combining restrained homology modeling and ROSETTA refinement. In contrast to others, the present model is in agreement with expected channel properties such as pore shape, dimensions, solvent accessibility, and experimentally derived distances. We have used the model to explore the interaction of open channel blockers within the pore, revealing a common binding mode and ionic interaction with K95, in agreement with experimental data. The binding-site was further validated using a virtual screening enrichment experiment, suggesting the model might be suitable for drug discovery. In addition, we subjected the model to a molecular dynamics simulation, revealing previously unaddressed salt-bridge interactions that may be important for structure stability and pore-lining residues that may take part in Cl(-) conductance.
Collapse
|
37
|
Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms. Genome Biol 2012; 13:R46. [PMID: 22703998 PMCID: PMC3446318 DOI: 10.1186/gb-2012-13-6-r46] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Revised: 05/25/2012] [Accepted: 06/15/2012] [Indexed: 02/07/2023] Open
Abstract
Background Complex diseases are associated with altered interactions between thousands of genes. We developed a novel method to identify and prioritize disease genes, which was generally applicable to complex diseases. Results We identified modules of highly interconnected genes in disease-specific networks derived from integrating gene-expression and protein interaction data. We examined if those modules were enriched for disease-associated SNPs, and could be used to find novel genes for functional studies. First, we analyzed publicly available gene expression microarray and genome-wide association study (GWAS) data from 13, highly diverse, complex diseases. In each disease, highly interconnected genes formed modules, which were significantly enriched for genes harboring disease-associated SNPs. To test if such modules could be used to find novel genes for functional studies, we repeated the analyses using our own gene expression microarray and GWAS data from seasonal allergic rhinitis. We identified a novel gene, FGF2, whose relevance was supported by functional studies using combined small interfering RNA-mediated knock-down and gene expression microarrays. The modules in the 13 complex diseases analyzed here tended to overlap and were enriched for pathways related to oncological, metabolic and inflammatory diseases. This suggested that this union of the modules would be associated with a general increase in susceptibility for complex diseases. Indeed, we found that this union was enriched with GWAS genes for 145 other complex diseases. Conclusions Modules of highly interconnected complex disease genes were enriched for disease-associated SNPs, and could be used to find novel genes for functional studies.
Collapse
|
38
|
GPMiner: an integrated system for mining combinatorial cis-regulatory elements in mammalian gene group. BMC Genomics 2012; 13 Suppl 1:S3. [PMID: 22369687 PMCID: PMC3587379 DOI: 10.1186/1471-2164-13-s1-s3] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Sequence features in promoter regions are involved in regulating gene transcription initiation. Although numerous computational methods have been developed for predicting transcriptional start sites (TSSs) or transcription factor (TF) binding sites (TFBSs), they lack annotations for do not consider some important regulatory features such as CpG islands, tandem repeats, the TATA box, CCAAT box, GC box, over-represented oligonucleotides, DNA stability, and GC content. Additionally, the combinatorial interaction of TFs regulates the gene group that is associated with same expression pattern. To investigate gene transcriptional regulation, an integrated system that annotates regulatory features in a promoter sequence and detects co-regulation of TFs in a group of genes is needed. Results This work identifies TSSs and regulatory features in a promoter sequence, and recognizes co-occurrence of cis-regulatory elements in co-expressed genes using a novel system. Three well-known TSS prediction tools are incorporated with orthologous conserved features, such as CpG islands, nucleotide composition, over-represented hexamer nucleotides, and DNA stability, to construct the novel Gene Promoter Miner (GPMiner) using a support vector machine (SVM). According to five-fold cross-validation results, the predictive sensitivity and specificity are both roughly 80%. The proposed system allows users to input a group of gene names/symbols, enabling the co-occurrence of TFBSs to be determined. Additionally, an input sequence can also be analyzed for homogeneity of experimental mammalian promoter sequences, and conserved regulatory features between homologous promoters can be observed through cross-species analysis. After identifying promoter regions, regulatory features are visualized graphically to facilitate gene promoter observations. Conclusions The GPMiner, which has a user-friendly input/output interface, has numerous benefits in analyzing human and mouse promoters. The proposed system is freely available at http://GPMiner.mbc.nctu.edu.tw/.
Collapse
|
39
|
Abstract
The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the growing volume of biological data coupled to experimental testing of new discoveries. But hardware and software limitations in the current workflow infrastructure make it impossible or intractible to use real data from disparate sources for large-scale biological research. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of algorithms and tools which efficiently utilize high-performance compute power and large storage infrastructures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.
Collapse
|
40
|
i-ADHoRe 3.0--fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res 2011; 40:e11. [PMID: 22102584 PMCID: PMC3258164 DOI: 10.1093/nar/gkr955] [Citation(s) in RCA: 138] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1 h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein–protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution.
Collapse
|
41
|
Fox gene loci in Takifugu rubripes and Tetraodon nigroviridis genomes and comparison with those of medaka and zebrafish genomes. Genome 2011; 54:965-72. [PMID: 22073989 DOI: 10.1139/g11-065] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Members of the Fox gene family of transcriptional regulators are essential for animal development and have been extensively studied in vertebrates. The mouse and human genomes contain at least 40 FOX genes which are divided into 19 subclasses based on the sequence similarity of the highly conserved forkhead domain. Using the genome sequence of the Takifugu rubripes and Tetraodon nigroviridis , we examined the genomic complement of fox genes in these organisms to gain insight into the evolutionary relationship of this gene family. We identified 53 fox genes in Tetraodon nigroviridis and Takifugu rubripes genome by searching the forkhead domain. These genes are divided into 18 subclasses as follows: 8 fox genes in subclass O; 6 in subclass P ; 4 in subclasses D, J, and N; 3 in subclasses A, B, C, E, F, and I; 2 in subclasses K, L, and Q; and 1 in subclasses G, H, M, and R. Together with the forkhead domain sequences of human, chicken, frog, zebrafish, medaka, and Caenorhabditis elegans, the phylogenetic relationship of the fox genes in Takifugu rubripes and Tetraodon nigroviridis were analyzed and compared. The genes structure, general features, and the three-dimensional model of these genes were also discussed.
Collapse
|
42
|
Abstract
Dissecting the genetic control of complex trait variation remains very challenging, despite many advances in technology. The aim of this study was to use a major growth quantitative trait locus (QTL) in chickens mapped to chromosome 4 as a model for a targeted approach to dissect the QTL. We applied a variant of the genetical genomics approach to investigate genome-wide gene expression differences between two contrasting genotypes of a marked QTL. This targeted approach allows the direct quantification of the link between the genotypes and the genetic responses, thus narrowing the QTL-phenotype gap using fewer samples (i.e. microarrays) compared with the genome-wide genetical genomics studies. Four differentially expressed genes were localized under the region of the QTL. One of these genes is a potential positional candidate gene (AADAT) that affects lysine and tryptophan metabolism and has alternative splicing variants between the two genotypes. In addition, the lysine and glycolysis metabolism pathways were significantly enriched for differentially expressed genes across the genome. The targeted approach provided a complementary route to fine mapping of QTL by characterizing the local and the global downstream effects of the QTL and thus generating further hypotheses about the action of that QTL.
Collapse
|
43
|
Genome-Wide Analysis of Sox Genes in Medaka (Oryzias latipes) and Their Expression Pattern in Embryonic Development. Cytogenet Genome Res 2011; 134:283-94. [DOI: 10.1159/000329480] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2011] [Indexed: 12/23/2022] Open
|
44
|
The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011. [PMID: 21472436 DOI: 10.1007/s10969-011-9106-2.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
Abstract
The Protein Structure Initiative's Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org ) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI's high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.
Collapse
|
45
|
The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods. ACTA ACUST UNITED AC 2011; 12:45-54. [PMID: 21472436 PMCID: PMC3123456 DOI: 10.1007/s10969-011-9106-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Accepted: 03/21/2011] [Indexed: 01/10/2023]
Abstract
The Protein Structure Initiative’s Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI’s high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.
Collapse
|
46
|
Abstract
Ras proteins control many aspects of eukaryotic cell homeostasis by switching between active (GTP-bound) and inactive (GDP-bound) conformations, a reaction catalyzed by GTPase exchange factors (GEF) and GTPase activating proteins (GAP) regulators, respectively. Here, we show that the complexity, measured as number of genes, of the canonical Ras switch genetic system (including Ras, RasGEF, RasGAP and RapGAP families) from 24 eukaryotic organisms is correlated with their genome size and is inversely correlated to their evolutionary distances from humans. Moreover, different gene subfamilies within the Ras switch have contributed unevenly to the module’s expansion and speciation processes during eukaryote evolution. The Ras system remarkably reduced its genetic expansion after the split of the Euteleostomi clade and presently looks practically crystallized in mammals. Supporting evidence points to gene duplication as the predominant mechanism generating functional diversity in the Ras system, stressing the leading role of gene duplication in the Ras family expansion. Domain fusion and alternative splicing are significant sources of functional diversity in the GAP and GEF families but their contribution is limited in the Ras family. An evolutionary model of the Ras system expansion is proposed suggesting an inherent ‘decision making’ topology with the GEF input signal integrated by a homologous molecular mechanism and bifurcation in GAP signaling propagation.
Collapse
|
47
|
Dynamic expression patterns of 6-O endosulfatases during zebrafish development suggest a subfunctionalisation event for sulf2. Dev Dyn 2011; 239:3312-23. [PMID: 20981828 DOI: 10.1002/dvdy.22456] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The 6-O-endosulfatase enzymes (Sulfs) edit the final sulfation pattern and function of heparan sulfate (HS) by removal of 6-O-sulfate groups from the chain. To date, two mammalian sulf genes have been identified that regulate many signalling pathways during embryonic development. In zebrafish a sulf1 ortholog and duplicate copies of the mammalian sulf2 gene, sulf2a and sulf2, have been identified, which contain conserved motifs characteristic of vertebrate sulf genes. Zebrafish sulf1 and sulf2a are broadly expressed in the central nervous system (CNS) and non-neuronal tissue including heart, somite boundaries, olfactory system, and otic vesicle, whereas sulf2 expression is almost entirely restricted to the CNS. The duplicate copies of sulf2 have distinct expression patterns, which together mirror that of mouse sulf2, suggesting duplication in the teleost lineage has been followed by subfunctionalisation, whereby both genes need to be preserved by selection to ensure the ancestral gene's expression profile and function is maintained.
Collapse
|
48
|
A transcriptomic scan for positively selected genes in two closely related marine fishes: Sebastes caurinus and S. rastrelliger. Mar Genomics 2011; 4:93-8. [PMID: 21620330 DOI: 10.1016/j.margen.2011.02.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Revised: 02/02/2011] [Accepted: 02/05/2011] [Indexed: 01/09/2023]
Abstract
Comparative genomic analyses can provide valuable insight into functional evolutionary divergence among closely related species. Here we employ a comparative evolutionary analysis of expressed sequence tags (ESTs) from two closely related species of marine fishes (genus Sebastes--rockfish). Sebastes is a highly diverse group of marine fishes that inhabit a wide array of marine habitats and the study of this group can provide insights into speciation in the marine environment. ESTs were developed for S. caurinus (23,668 from brain, kidney, and spleen tissues) and S. rastrelliger (11,207 from brain and pituitary tissues). Following assembly we were able to identify, with high confidence, 257 orthologous sequence pairs between the two species through a reciprocal best hit blast search. An analysis of functional divergence between orthologs revealed that 19.46% had Ka/Ks values greater than 0.5 and 8.17% had Ka/Ks values greater than one, identifying a large pool of candidate genes to further study adaptive divergence in the group. Genes with elevated Ka/Ks values belonged to the following functional categories: immune function, metabolism, longevity, and reproductive behavior, indicating that adaptive divergence in these functional groups may be important in the diversification of this group of fishes. This study provides the ground work to better understand the molecular evolution of genes involved in a radiation of marine fishes.
Collapse
|
49
|
Abstract
The RNA Pol II transcription complex pauses just downstream of the promoter in a significant fraction of human genes. The local features of genomic structure that contribute to pausing have not been defined. Here, we show that genes that pause are more G-rich within the region flanking the transcription start site (TSS) than RefSeq genes or non-paused genes. We show that enrichment of binding motifs for common transcription factors, such as SP1, may account for G-richness upstream but not downstream of the TSS. We further show that pausing correlates with the presence of a GrIn1 element, an element bearing one or more G4 motifs at the 5′-end of the first intron, on the non-template DNA strand. These results suggest potential roles for dynamic G4 DNA and G4 RNA structures in cis-regulation of pausing, and thus genome-wide regulation of gene expression, in human cells.
Collapse
|
50
|
Development and application of bovine and porcine oligonucleotide arrays with protein-based annotation. J Biomed Biotechnol 2010; 2010:453638. [PMID: 21197395 PMCID: PMC3010673 DOI: 10.1155/2010/453638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Accepted: 11/01/2010] [Indexed: 12/11/2022] Open
Abstract
The design of oligonucleotide sequences for the detection of gene expression in species with disparate volumes of genome and EST sequence information has been broadly studied. However, a congruous strategy has yet to emerge to allow the design of sensitive and specific gene expression detection probes. This study explores the use of a phylogenomic approach to align transcribed sequences to vertebrate protein sequences for the detection of gene families to design genomewide 70-mer oligonucleotide probe sequences for bovine and porcine. The bovine array contains 23,580 probes that target the transcripts of 16,341 genes, about 72% of the total number of bovine genes. The porcine array contains 19,980 probes targeting 15,204 genes, about 76% of the genes in the Ensembl annotation of the pig genome. An initial experiment using the bovine array demonstrates the specificity and sensitivity of the array.
Collapse
|