1
|
DNA sequence-dependent chromatin architecture and nuclear hubs formation. Sci Rep 2019; 9:14646. [PMID: 31601866 PMCID: PMC6787200 DOI: 10.1038/s41598-019-51036-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 09/18/2019] [Indexed: 02/08/2023] Open
Abstract
In this study, by exploring chromatin conformation capture data, we show that the nuclear segregation of Topologically Associated Domains (TADs) is contributed by DNA sequence composition. GC-peaks and valleys of TADs strongly influence interchromosomal interactions and chromatin 3D structure. To gain insight on the compositional and functional constraints associated with chromatin interactions and TADs formation, we analysed intra-TAD and intra-loop GC variations. This led to the identification of clear GC-gradients, along which, the density of genes, super-enhancers, transcriptional activity, and CTCF binding sites occupancy co-vary non-randomly. Further, the analysis of DNA base composition of nucleolar aggregates and nuclear speckles showed strong sequence-dependant effects. We conjecture that dynamic DNA binding affinity and flexibility underlay the emergence of chromatin condensates, their growth is likely promoted in mechanically soft regions (GC-rich) of the lowest chromatin and nucleosome densities. As a practical perspective, the strong linear association between sequence composition and interchromosomal contacts can help define consensus chromatin interactions, which in turn may be used to study alternative states of chromatin architecture.
Collapse
|
2
|
A common genomic code for chromatin architecture and recombination landscape. PLoS One 2019; 14:e0213278. [PMID: 30865674 PMCID: PMC6415826 DOI: 10.1371/journal.pone.0213278] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 02/18/2019] [Indexed: 12/14/2022] Open
Abstract
Recent findings established a link between DNA sequence composition and interphase chromatin architecture and explained the evolutionary conservation of TADs (Topologically Associated Domains) and LADs (Lamina Associated Domains) in mammals. This prompted us to analyse conformation capture and recombination rate data to study the relationship between chromatin architecture and recombination landscape of human and mouse genomes. The results reveal that: (1) low recombination domains and blocks of elevated linkage disequilibrium tend to coincide with TADs and isochores, indicating co-evolving regulatory elements and genes in insulated neighbourhoods; (2) double strand break (DSB) and recombination frequencies increase in the short loops of GC-rich TADs, whereas recombination cold spots are typical of LADs and (3) the binding and loading of proteins, which are critical for DSB and meiotic recombination (SPO11, DMC1, H3K4me3 and PRMD9) are higher in GC-rich TADs. One explanation for these observations is that the occurrence of DSB and recombination in meiotic cells are associated with compositional and epigenetic features (genomic code) that influence DNA stiffness/flexibility and appear to be similar to those guiding the chromatin architecture in the interphase nucleus of pre-leptotene cells.
Collapse
|
3
|
Abstract
Genetic Generalized Epilepsy (GGE) and benign epilepsy with centro-temporal spikes or Rolandic Epilepsy (RE) are common forms of genetic epilepsies. Rare copy number variants have been recognized as important risk factors in brain disorders. We performed a systematic survey of rare deletions affecting protein-coding genes derived from exome data of patients with common forms of genetic epilepsies. We analysed exomes from 390 European patients (196 GGE and 194 RE) and 572 population controls to identify low-frequency genic deletions. We found that 75 (32 GGE and 43 RE) patients out of 390, i.e. ~19%, carried rare genic deletions. In particular, large deletions (>400 kb) represent a higher burden in both GGE and RE syndromes as compared to controls. The detected low-frequency deletions (1) share genes with brain-expressed exons that are under negative selection, (2) overlap with known autism and epilepsy-associated candidate genes, (3) are enriched for CNV intolerant genes recorded by the Exome Aggregation Consortium (ExAC) and (4) coincide with likely disruptive de novo mutations from the NPdenovo database. Employing several knowledge databases, we discuss the most prominent epilepsy candidate genes and their protein-protein networks for GGE and RE.
Collapse
|
4
|
The Diverging Routes of BORIS and CTCF: An Interactomic and Phylogenomic Analysis. Life (Basel) 2018; 8:life8010004. [PMID: 29385718 PMCID: PMC5871936 DOI: 10.3390/life8010004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 01/25/2018] [Accepted: 01/25/2018] [Indexed: 12/11/2022] Open
Abstract
The CCCTC-binding factor (CTCF) is multi-functional, ubiquitously expressed, and highly conserved from Drosophila to human. It has important roles in transcriptional insulation and the formation of a high-dimensional chromatin structure. CTCF has a paralog called “Brother of Regulator of Imprinted Sites” (BORIS) or “CTCF-like” (CTCFL). It binds DNA at sites similar to those of CTCF. However, the expression profiles of the two proteins are quite different. We investigated the evolutionary trajectories of the two proteins after the duplication event using a phylogenomic and interactomic approach. We find that CTCF has 52 direct interaction partners while CTCFL only has 19. Almost all interactors already existed before the emergence of CTCF and CTCFL. The unique secondary loss of CTCF from several nematodes is paralleled by a loss of two of its interactors, the polycomb repressive complex subunit SuZ12 and the multifunctional transcription factor TYY1. In contrast to earlier studies reporting the absence of BORIS from birds, we present evidence for a multigene synteny block containing CTCFL that is conserved in mammals, reptiles, and several species of birds, indicating that not the entire lineage of birds experienced a loss of CTCFL. Within this synteny block, BORIS and its genomic neighbors seem to be partitioned into two nested chromatin loops. The high expression of SPO11, RAE1, RBM38, and PMEPA1 in male tissues suggests a possible link between CTCFL, meiotic recombination, and fertility-associated phenotypes. Using the 65,700 exomes and the 1000 genomes data, we observed a higher number of intergenic, non-synonymous, and loss-of-function mutations in CTCFL than in CTCF, suggesting a reduced strength of purifying selection, perhaps due to less functional constraint.
Collapse
|
5
|
Exome-wide analysis of mutational burden in patients with typical and atypical Rolandic epilepsy. Eur J Hum Genet 2018; 26:258-264. [PMID: 29358611 DOI: 10.1038/s41431-017-0034-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Revised: 09/27/2017] [Accepted: 10/06/2017] [Indexed: 12/30/2022] Open
Abstract
Rolandic epilepsy (RE) is the most common focal epilepsy in childhood. To date no hypothesis-free exome-wide mutational screen has been conducted for RE and atypical RE (ARE). Here we report on whole-exome sequencing of 194 unrelated patients with RE/ARE and 567 ethnically matched population controls. We identified an exome-wide significantly enriched burden for deleterious and loss-of-function variants only for the established RE/ARE gene GRIN2A. The statistical significance of the enrichment disappeared after removing ARE patients. For several disease-related gene-sets, an odds ratio >1 was detected for loss-of-function variants.
Collapse
|
6
|
Abstract
A recent investigation showed the existence of correlations between the architectural features of mammalian interphase chromosomes and the compositional properties of isochores. This result prompted us to compare maps of the Topologically Associating Domains (TADs) and of the Lamina Associated Domains (LADs) with the corresponding isochore maps of mouse and human chromosomes. This approach revealed that: 1) TADs and LADs correspond to isochores, i.e., isochores are the genomic units that underlie chromatin domains; 2) the conservation of TADs and LADs in mammalian genomes is explained by the evolutionary conservation of isochores; 3) chromatin domains corresponding to GC-poor isochores (e.g., LADs) show not only self-interactions but also intrachromosomal interactions with other domains also corresponding to GC-poor isochores even if located far away; in contrast, chromatin domains corresponding to GC-rich isochores (e.g., TADs) show more localized chromosomal interactions, many of which are inter-chromosomal. In conclusion, this investigation establishes a link between DNA sequences and chromatin architecture, explains the evolutionary conservation of TADs and LADs and provides new information on the spatial distribution of GC-poor/gene-poor and GC-rich/gene-rich chromosomal regions in the interphase nucleus.
Collapse
|
7
|
A genomic view on epilepsy and autism candidate genes. Genomics 2016; 108:31-6. [PMID: 26772991 DOI: 10.1016/j.ygeno.2016.01.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/15/2015] [Accepted: 01/01/2016] [Indexed: 01/25/2023]
Abstract
Epilepsy is a common complex disorder most frequently associated with psychiatric and neurological diseases. Massive parallel sequencing of individual or cohort genomes and exomes led the identification of several disease associated genes. We review here the candidate genes in epilepsy genetics with focus on exome and gene panel data. Together with the examination of brain expressed genes and post synaptic proteome the results show that: (1) Non-metabolic epilepsies and autism candidate genes tend to be AT-rich and (2) large transcript size and local AT-richness are characteristic features of genes involved in developmental brain disorders and synaptic functions. These results point to the preferential location of core epilepsy and autism candidate genes in late replicating, GC-poor chromosomal regions (isochores). These results indicate that the genomic alterations leading to some brain disorders are confined to responsive chromatin areas harboring brain critical genes.
Collapse
|
8
|
Abstract
Oilseed rape (Brassica napus L.) was formed ~7500 years ago by hybridization between B. rapa and B. oleracea, followed by chromosome doubling, a process known as allopolyploidy. Together with more ancient polyploidizations, this conferred an aggregate 72× genome multiplication since the origin of angiosperms and high gene content. We examined the B. napus genome and the consequences of its recent duplication. The constituent An and Cn subgenomes are engaged in subtle structural, functional, and epigenetic cross-talk, with abundant homeologous exchanges. Incipient gene loss and expression divergence have begun. Selection in B. napus oilseed types has accelerated the loss of glucosinolate genes, while preserving expansion of oil biosynthesis genes. These processes provide insights into allopolyploid evolution and its relationship with crop domestication and improvement.
Collapse
|
9
|
Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 2014; 345:950-3. [PMID: 25146293 DOI: 10.1126/science.1253435] [Citation(s) in RCA: 1362] [Impact Index Per Article: 136.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Oilseed rape (Brassica napus L.) was formed ~7500 years ago by hybridization between B. rapa and B. oleracea, followed by chromosome doubling, a process known as allopolyploidy. Together with more ancient polyploidizations, this conferred an aggregate 72× genome multiplication since the origin of angiosperms and high gene content. We examined the B. napus genome and the consequences of its recent duplication. The constituent An and Cn subgenomes are engaged in subtle structural, functional, and epigenetic cross-talk, with abundant homeologous exchanges. Incipient gene loss and expression divergence have begun. Selection in B. napus oilseed types has accelerated the loss of glucosinolate genes, while preserving expansion of oil biosynthesis genes. These processes provide insights into allopolyploid evolution and its relationship with crop domestication and improvement.
Collapse
|
10
|
Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 2014; 32:656-62. [PMID: 24908277 PMCID: PMC4113729 DOI: 10.1038/nbt.2906] [Citation(s) in RCA: 320] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 04/14/2014] [Indexed: 01/21/2023]
Abstract
Cultivated citrus are selections from, or hybrids of, wild progenitor species whose identities and contributions to citrus domestication remain controversial. Here we sequence and compare citrus genomes--a high-quality reference haploid clementine genome and mandarin, pummelo, sweet-orange and sour-orange genomes--and show that cultivated types derive from two progenitor species. Although cultivated pummelos represent selections from one progenitor species, Citrus maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species Citrus reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, thus implying that wild mandarins were part of the early breeding germplasm. A Chinese wild 'mandarin' diverges substantially from C. reticulata, thus suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and facilitates sequence-directed genetic improvement.
Collapse
|
11
|
DEPDC5 mutations in genetic focal epilepsies of childhood. Ann Neurol 2014; 75:788-92. [PMID: 24591017 DOI: 10.1002/ana.24127] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Revised: 02/18/2014] [Accepted: 02/26/2014] [Indexed: 01/25/2023]
Abstract
Recent studies reported DEPDC5 loss-of-function mutations in different focal epilepsy syndromes. Here we identified 1 predicted truncation and 2 missense mutations in 3 children with rolandic epilepsy (3 of 207). In addition, we identified 3 families with unclassified focal childhood epilepsies carrying predicted truncating DEPDC5 mutations (3 of 82). The detected variants were all novel, inherited, and present in all tested affected (n=11) and in 7 unaffected family members, indicating low penetrance. Our findings extend the phenotypic spectrum associated with mutations in DEPDC5 and suggest that rolandic epilepsy, albeit rarely, and other nonlesional childhood epilepsies are among the associated syndromes.
Collapse
|
12
|
The streamlined genome of Phytomonas spp. relative to human pathogenic kinetoplastids reveals a parasite tailored for plants. PLoS Genet 2014; 10:e1004007. [PMID: 24516393 PMCID: PMC3916237 DOI: 10.1371/journal.pgen.1004007] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Accepted: 10/23/2013] [Indexed: 11/18/2022] Open
Abstract
Members of the family Trypanosomatidae infect many organisms, including animals, plants and humans. Plant-infecting trypanosomes are grouped under the single genus Phytomonas, failing to reflect the wide biological and pathological diversity of these protists. While some Phytomonas spp. multiply in the latex of plants, or in fruit or seeds without apparent pathogenicity, others colonize the phloem sap and afflict plants of substantial economic value, including the coffee tree, coconut and oil palms. Plant trypanosomes have not been studied extensively at the genome level, a major gap in understanding and controlling pathogenesis. We describe the genome sequences of two plant trypanosomatids, one pathogenic isolate from a Guianan coconut and one non-symptomatic isolate from Euphorbia collected in France. Although these parasites have extremely distinct pathogenic impacts, very few genes are unique to either, with the vast majority of genes shared by both isolates. Significantly, both Phytomonas spp. genomes consist essentially of single copy genes for the bulk of their metabolic enzymes, whereas other trypanosomatids e.g. Leishmania and Trypanosoma possess multiple paralogous genes or families. Indeed, comparison with other trypanosomatid genomes revealed a highly streamlined genome, encoding for a minimized metabolic system while conserving the major pathways, and with retention of a full complement of endomembrane organelles, but with no evidence for functional complexity. Identification of the metabolic genes of Phytomonas provides opportunities for establishing in vitro culturing of these fastidious parasites and new tools for the control of agricultural plant disease. Some plant trypanosomes, single-celled organisms living in phloem sap, are responsible for important palm diseases, inducing frequent expensive and toxic insecticide treatments against their insect vectors. Other trypanosomes multiply in latex tubes without detriment to their host. Despite the wide range of behaviors and impacts, these trypanosomes have been rather unceremoniously lumped into a single genus: Phytomonas. A battery of molecular probes has been used for their characterization but no clear phylogeny or classification has been established. We have sequenced the genomes of a pathogenic phloem-specific Phytomonas from a diseased South American coconut palm and a latex-specific isolate collected from an apparently healthy wild euphorb in the south of France. Upon comparison with each other and with human pathogenic trypanosomes, both Phytomonas revealed distinctive compact genomes, consisting essentially of single-copy genes, with the vast majority of genes shared by both isolates irrespective of their effect on the host. A strong cohort of enzymes in the sugar metabolism pathways was consistent with the nutritional environments found in plants. The genetic nuances may reveal the basis for the behavioral differences between these two unique plant parasites, and indicate the direction of our future studies in search of effective treatment of the crop disease parasites.
Collapse
|
13
|
The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 2012; 488:213-7. [PMID: 22801500 DOI: 10.1038/nature11241] [Citation(s) in RCA: 603] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 05/18/2012] [Indexed: 01/17/2023]
Abstract
Bananas (Musa spp.), including dessert and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister group to the well-studied Poales, which include cereals. Bananas are vital for food security in many tropical and subtropical countries and the most popular fruit in industrialized countries. The Musa domestication process started some 7,000 years ago in Southeast Asia. It involved hybridizations between diverse species and subspecies, fostered by human migrations, and selection of diploid and triploid seedless, parthenocarpic hybrids thereafter widely dispersed by vegetative propagation. Half of the current production relies on somaclones derived from a single triploid genotype (Cavendish). Pests and diseases have gradually become adapted, representing an imminent danger for global banana production. Here we describe the draft sequence of the 523-megabase genome of a Musa acuminata doubled-haploid genotype, providing a crucial stepping-stone for genetic improvement of banana. We detected three rounds of whole-genome duplications in the Musa lineage, independently of those previously described in the Poales lineage and the one we detected in the Arecales lineage. This first monocotyledon high-continuity whole-genome sequence reported outside Poales represents an essential bridge for comparative genome analysis in plants. As such, it clarifies commelinid-monocotyledon phylogenetic relationships, reveals Poaceae-specific features and has led to the discovery of conserved non-coding sequences predating monocotyledon-eudicotyledon divergence.
Collapse
|
14
|
Evaluation of Accuracy and Performance of a Fast Monte Carlo Code for Dose Calculation in Proton Therapy. Clin Oncol (R Coll Radiol) 2011. [DOI: 10.1016/j.clon.2011.01.435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
15
|
Transcription factor families inferred from genome sequences of photosynthetic stramenopiles. THE NEW PHYTOLOGIST 2010; 188:52-66. [PMID: 20646219 DOI: 10.1111/j.1469-8137.2010.03371.x] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
• By comparative analyses we identify lineage-specific diversity in transcription factors (TFs) from stramenopile (or heterokont) genome sequences. We compared a pennate (Phaeodactylum tricornutum) and a centric diatom (Thalassiosira pseudonana) with those of other stramenopiles (oomycetes, Pelagophyceae, and Phaeophyceae (Ectocarpus siliculosus)) as well as to that of Emiliania huxleyi, a haptophyte that is evolutionarily related to the stramenopiles. • We provide a detailed description of diatom TF complements and report numerous peculiarities: in both diatoms, the heat shock factor (HSF) family is overamplified and constitutes the most abundant class of TFs; Myb and C2H2-type zinc finger TFs are the two most abundant TF families encoded in all the other stramenopile genomes investigated; the presence of diatom and lineage-specific gene fusions, in particular a class of putative photoreceptors with light-sensitive Per-Arnt-Sim (PAS) and DNA-binding (basic-leucine zipper, bZIP) domains and an HSF-AP2 domain fusion. • Expression data analysis shows that many of the TFs studied are transcribed and may be involved in specific responses to environmental stimuli. • Evolutionary and functional relevance of these observations are discussed.
Collapse
|
16
|
Digital expression profiling of novel diatom transcripts provides insight into their biological functions. Genome Biol 2010; 11:R85. [PMID: 20738856 PMCID: PMC2945787 DOI: 10.1186/gb-2010-11-8-r85] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Revised: 05/11/2010] [Accepted: 08/25/2010] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Diatoms represent the predominant group of eukaryotic phytoplankton in the oceans and are responsible for around 20% of global photosynthesis. Two whole genome sequences are now available. Notwithstanding, our knowledge of diatom biology remains limited because only around half of their genes can be ascribed a function based onhomology-based methods. High throughput tools are needed, therefore, to associate functions with diatom-specific genes. RESULTS We have performed a systematic analysis of 130,000 ESTs derived from Phaeodactylum tricornutum cells grown in 16 different conditions. These include different sources of nitrogen, different concentrations of carbon dioxide, silicate and iron, and abiotic stresses such as low temperature and low salinity. Based on unbiased statistical methods, we have catalogued transcripts with similar expression profiles and identified transcripts differentially expressed in response to specific treatments. Functional annotation of these transcripts provides insights into expression patterns of genes involved in various metabolic and regulatory pathways and into the roles of novel genes with unknown functions. Specific growth conditions could be associated with enhanced gene diversity, known gene product functions, and over-representation of novel transcripts. Comparative analysis of data from the other sequenced diatom, Thalassiosira pseudonana, helped identify several unique diatom genes that are specifically regulated under particular conditions, thus facilitating studies of gene function, genome annotation and the molecular basis of species diversity. CONCLUSIONS The digital gene expression database represents a new resource for identifying candidate diatom-specific genes involved in processes of major ecological relevance.
Collapse
|
17
|
The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 2010; 465:617-21. [PMID: 20520714 DOI: 10.1038/nature09016] [Citation(s) in RCA: 518] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2009] [Accepted: 03/15/2010] [Indexed: 01/05/2023]
Abstract
Brown algae (Phaeophyceae) are complex photosynthetic organisms with a very different evolutionary history to green plants, to which they are only distantly related. These seaweeds are the dominant species in rocky coastal ecosystems and they exhibit many interesting adaptations to these, often harsh, environments. Brown algae are also one of only a small number of eukaryotic lineages that have evolved complex multicellularity (Fig. 1). We report the 214 million base pair (Mbp) genome sequence of the filamentous seaweed Ectocarpus siliculosus (Dillwyn) Lyngbye, a model organism for brown algae, closely related to the kelps (Fig. 1). Genome features such as the presence of an extended set of light-harvesting and pigment biosynthesis genes and new metabolic processes such as halide metabolism help explain the ability of this organism to cope with the highly variable tidal environment. The evolution of multicellularity in this lineage is correlated with the presence of a rich array of signal transduction genes. Of particular interest is the presence of a family of receptor kinases, as the independent evolution of related molecules has been linked with the emergence of multicellularity in both the animal and green plant lineages. The Ectocarpus genome sequence represents an important step towards developing this organism as a model species, providing the possibility to combine genomic and genetic approaches to explore these and other aspects of brown algal biology further.
Collapse
|
18
|
Comparative ecophysiology and genomics of the toxic unicellular alga Fibrocapsa japonica. THE NEW PHYTOLOGIST 2010; 185:446-458. [PMID: 19912547 DOI: 10.1111/j.1469-8137.2009.03074.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Summary *Ten axenic cultures, referred to as Fibrocapsa japonica, were studied for their morphology, pigment composition, toxicity and phylogeny. *Morphologically, all 10 accessions were similar and displayed equivalent pigment contents. We identified chlorophylls a and c, beta-carotene and fucoxanthin as the dominant pigments, together with xanthophyll cycle carotenoids likely to be involved in photoprotection. *All 10 accessions caused brine shrimp, Artemia salina, mortality and displayed haemolytic and haemaglutination activities toward sheep erythrocytes. Our results indicate that haemaglutination activity is a key component of F. japonica toxicity. *Examination of a collection of F. japonica expressed sequence tags (ESTs) has led to the identification of candidate genes involved in F. japonica toxicity and/or growth control.
Collapse
|
19
|
SU-GG-T-326: A Fast Monte Carlo Code for Proton Transport in Radiation Therapy Based On Pre-Calculated Tracks From MCNPX. Med Phys 2008. [DOI: 10.1118/1.2962078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
20
|
Abstract
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.
Collapse
|
21
|
Abstract
Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.
Collapse
|
22
|
SU-FF-T-349: PMC, a New Fast Monte Carlo Code for Radiation Therapy. Med Phys 2007. [DOI: 10.1118/1.2761013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
23
|
Simple proteomic checks for detecting noncoding RNA. Proteomics 2007. [DOI: 10.1002/pmic.200790033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
24
|
The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci U S A 2007; 104:7705-10. [PMID: 17460045 PMCID: PMC1863510 DOI: 10.1073/pnas.0611046104] [Citation(s) in RCA: 417] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The smallest known eukaryotes, at approximately 1-mum diameter, are Ostreococcus tauri and related species of marine phytoplankton. The genome of Ostreococcus lucimarinus has been completed and compared with that of O. tauri. This comparison reveals surprising differences across orthologous chromosomes in the two species from highly syntenic chromosomes in most cases to chromosomes with almost no similarity. Species divergence in these phytoplankton is occurring through multiple mechanisms acting differently on different chromosomes and likely including acquisition of new genes through horizontal gene transfer. We speculate that this latter process may be involved in altering the cell-surface characteristics of each species. In addition, the genome of O. lucimarinus provides insights into the unique metal metabolism of these organisms, which are predicted to have a large number of selenocysteine-containing proteins. Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and thus the cell may require less of that protein. As reported here, selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.
Collapse
|
25
|
Abstract
Proper validation can accelerate sequence-based discovery of proteins and protein-coding genes. Databases currently contain a backlog of experimentally unverified gene models and tentative assignments of observed transcripts to coding or noncoding RNA. We present and apply a general principle, founded on base composition and the genetic code and validated here by bulk 2-D gels, that can improve the reliability of such classifications and of the algorithms or pipelines that lead to them.
Collapse
|
26
|
Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc Natl Acad Sci U S A 2006; 103:11647-52. [PMID: 16868079 PMCID: PMC1544224 DOI: 10.1073/pnas.0604795103] [Citation(s) in RCA: 528] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2006] [Indexed: 02/06/2023] Open
Abstract
The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C(4) photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry.
Collapse
|
27
|
Sci-Fri AM General-04: High Contrast Imaging Using Orthogonal Bremsstrahlung Beams: An Experimental Study of Radiation Quality. Med Phys 2006. [DOI: 10.1118/1.2244669] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
28
|
SU-CC-ValA-09: Radiation Quality in High Contrast Imaging with Orthogonal Bremsstrahlung Beams. Med Phys 2006. [DOI: 10.1118/1.2240127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
29
|
The evolution of introns in human duplicated genes. Gene 2006; 365:41-7. [PMID: 16356663 DOI: 10.1016/j.gene.2005.09.038] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Revised: 07/07/2005] [Accepted: 09/07/2005] [Indexed: 11/17/2022]
Abstract
In previous work [Jabbari, K., Rayko, E., Bernardi, G., 2003. The major shifts of human duplicated genes. Gene 317, 203-208], we investigated the fate of ancient duplicated genes after the compositional transitions that occurred between the genomes of cold- and warm-blooded vertebrates. We found that the majority of duplicated copies were transposed to the "ancestral genome core", the gene-dense genome compartment that underwent a GC enrichment at the compositional transitions. Here, we studied the consequences of the events just outlined on the introns of duplicated genes. We found that, while intron number was highly conserved, total intron size (the sum of intron sizes within any given gene) was smaller in the GC-rich copies compared to the GC-poor copies, especially in dispersed copies (i.e., copies located on different chromosomes or chromosome arms). GC-rich copies also showed higher densities of CpG islands and Alus, whereas GC-poor copies were characterized by higher densities of LINEs. The features of the copies that underwent the compositional transition and became GC-richer are suggestive of, or related to, functional changes.
Collapse
|
30
|
SU-FF-T-312: Feasbility Study of Orthogonal Bremsstrahlung Beams for Improved Radiation Therapy Imaging. Med Phys 2005. [DOI: 10.1118/1.1998041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
31
|
Comparative genomics of the pennate diatom Phaeodactylum tricornutum. PLANT PHYSIOLOGY 2005; 137:500-13. [PMID: 15665249 PMCID: PMC1065351 DOI: 10.1104/pp.104.052829] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2004] [Revised: 11/24/2004] [Accepted: 11/25/2004] [Indexed: 05/04/2023]
Abstract
Diatoms are one of the most important constituents of phytoplankton communities in aquatic environments, but in spite of this, only recently have large-scale diatom-sequencing projects been undertaken. With the genome of the centric species Thalassiosira pseudonana available since mid-2004, accumulating sequence information for a pennate model species appears a natural subsequent aim. We have generated over 12,000 expressed sequence tags (ESTs) from the pennate diatom Phaeodactylum tricornutum, and upon assembly into a nonredundant set, 5,108 sequences were obtained. Significant similarity (E < 1E-04) to entries in the GenBank nonredundant protein database, the COG profile database, and the Pfam protein domains database were detected, respectively, in 45.0%, 21.5%, and 37.1% of the nonredundant collection of sequences. This information was employed to functionally annotate the P. tricornutum nonredundant set and to create an internet-accessible queryable diatom EST database. The nonredundant collection was then compared to the putative complete proteomes of the green alga Chlamydomonas reinhardtii, the red alga Cyanidioschyzon merolae, and the centric diatom T. pseudonana. A number of intriguing differences were identified between the pennate and the centric diatoms concerning activities of relevance for general cell metabolism, e.g. genes involved in carbon-concentrating mechanisms, cytosolic acetyl-Coenzyme A production, and fructose-1,6-bisphosphate metabolism. Finally, codon usage and utilization of C and G relative to gene expression (as measured by EST redundance) were studied, and preferences for utilization of C and CpG doublets were noted among the P. tricornutum EST coding sequences.
Collapse
|
32
|
Comparative genomics of Anopheles gambiae and Drosophila melanogaster. Gene 2004; 333:183-6. [PMID: 15177694 DOI: 10.1016/j.gene.2004.02.038] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 02/10/2004] [Indexed: 10/26/2022]
Abstract
A sequence analysis of the genomes of Anopheles gambiae and Drosophila melanogaster reveals that Anopheles DNA is more heterogeneous and GC-richer than Drosophila DNA. The gene concentration across the Anopheles genome is characterized by low levels in the GC-poor part of the genome and a 3-fold increase in the GC-richest part; this gene density gradient is approximately half that of Drosophila. GC levels of introns and flanking sequences are correlated with GC(3) values (GC levels of third codon positions) of the corresponding genes with slopes much lower than unity; in other words, most introns and intergenic sequences are less GC-rich than the corresponding GC(3) values. These findings, which describe a compositional shift within Diptera, is of interest because of their parallels in the well studied major shift in vertebrates.
Collapse
|
33
|
Body temperature and evolutionary genomics of vertebrates: a lesson from the genomes of Takifugu rubripes and Tetraodon nigroviridis. Gene 2004; 333:179-81. [PMID: 15177693 DOI: 10.1016/j.gene.2004.02.048] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Accepted: 02/05/2004] [Indexed: 11/28/2022]
Abstract
In this paper, we provide evidence for the body temperature effect on the formation of GC-rich isochores, by analysing genomic sequences from two puffer fishes living at different temperatures. The higher body temperature of Tetraodon nigroviridis compared to Takifugu rubripes (DeltaT approximately 15 degrees C) appears to be the cause of a higher compositional heterogeneity of the former due to the formation of GC-rich regions. Such an effect does not only concern large DNA segments but also coding sequences.
Collapse
|
34
|
Abstract
Between one third and one half of the proposed rice genes appear to have no homologs in other species, including Arabidopsis. Compositional considerations, and a comparison of curated rice sequences with ex novo predictions, suggest that many or most of the putative genes without homologs may be false positive predictions, i.e., sequences that are never translated into functional proteins in vivo.
Collapse
|
35
|
Abstract
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Collapse
|
36
|
|
37
|
Abstract
An analysis of dinucleotide frequencies was carried out on DNAs from insects and mammals, as well as on large DNA sequences from the genomes of Drosophila melanogaster, Anopheles gambiae, puffer fish (Takifugu rubripes), zebra fish (Danio rerio) and human. These organisms were chosen because Drosophila and Anopheles DNAs have an extremely low level of methylation, human DNA a high level and fish DNA a two-fold higher level compared to human. The results indicate that: (i) CpG deficiency and the corresponding TpG (CpA) excess show no correlation with the level of DNA methylation; indeed, genomes endowed with strikingly different levels of DNA methylation (such as those of Drosophila and human) exhibited similar TpG (CpA) levels; (ii) the correlation between GC levels of large (50 kb) DNA sequences and TpA or CpG shortage levels do not appear to be due to CpG methylation followed by deamination; (iii) CpG dinucleotides are more frequent in fishes than in mammals; interestingly, the monotreme Ornitorhinchus anatinus shows an intermediate CpG frequency. The implications of these results are discussed.
Collapse
|
38
|
Abstract
The localization of HIV-1 proviruses in compositional DNA fractions from 27 AIDS patients during the chronic phase of the disease with depletion of CD4+ and different levels of viremia showed the following. (1) At low viremia, proviruses are predominantly localized in the GC-richest isochores, which are characterized by an open chromatin structure; this result mimics findings on HIV-1 integration in early infected cells in culture. (2) At higher viremia, an increased distribution of proviruses in GC-poor isochores (which match the GC poorness of HIV-1) was found; this suggests a selection of cells in which the 'isopycnic' localization leads to a higher expression of proviruses and, in turn, to higher viremia. (3) At the highest viremia, integrations in GC-rich isochores are often predominant again, but generally not at the same level as in (1); this may be the consequence of new integrations from the extremely abundant RNA copies.
Collapse
|
39
|
Abstract
A recent paper by Belle et al. (J. Mol. Evol. 55 (2002) 356) reported an analysis of mean GC(3) (the GC level of third codon positions) and standard deviations of GC(3) of vertebrate genomes as related to body temperature, and concluded that "the thermal stability hypothesis does not appear to explain the general patterns of composition", apparently contradicting a previous working hypothesis from our laboratory. We have analyzed the data of Belle et al. and find that their data not only do not contradict the thermal stability hypothesis, but if anything support it.
Collapse
|
40
|
Abstract
A positive correlation holds between the GC level of third codon positions of human genes (GC(3)) and hydropathy of the encoded proteins. This correlation may appear counterintuitive, since it links a physical property of proteins to the base composition of 'synonymous' sites. We here establish the nontriviality of the correlation, which has recently been contested. In particular, the correlation cannot simply be a consequence of an analogous correlation for first and second codon positions, since no such correlation exists. More generally, for any explanation via two chained correlations, the intermediate property would need to be strongly correlated with hydrophobicity and/or GC(3).
Collapse
|
41
|
Abstract
Since many gene duplications in the human genome are ancient duplications going back to the origin of vertebrates, the question may be asked about the fate of such duplicated genes at the compositional genome transitions that occurred between cold- and warm-blooded vertebrates. Indeed, at that transition, about half of the (GC-poor) genes of cold-blooded vertebrates (the genes of the gene-dense "ancestral genome core") underwent a GC enrichment to become the genes of the "genome core" of warm-blooded vertebrates. Since the compositional distribution of the human duplicated genes investigated (1111 pairs) mimics the general distribution of human genes (about 50% GC(3)-poor and 50% GC(3)-rich genes, the border being at 60% GC(3)), we considered two possibilities, namely that the compositional transition affected either (i) about half of the copies on a random basis, or (ii) preferentially only one copy of the duplicated genes. The two possibilities could be distinguished if each copy is put into one of two subsets according to its GC(3) level. Indeed, in the first case, the two distributions would be similar, whereas in the second case, the two distributions would be different, one copy having maintained the ancestral GC-poor composition, and one copy having undergone the compositional change. Using this approach, we could show that, by far and large, one copy of the duplicated genes preferentially underwent the GC enrichment. This result implies that this copy, which had possibly acquired a different function and/or regulation, was preferentially translocated into the gene-dense compartment of the genome, the "ancestral genome core", namely the "gene space" which underwent the compositional transition at the emergence of warm-blooded vertebrates.
Collapse
|
42
|
Abstract
Gene prediction relies on the identification of characteristic features of coding sequences that distinguish them from non-coding DNA. The recent large-scale sequencing of entire genomes from higher eukaryotes, in conjunction with currently used gene prediction algorithms, has provided an abundance of putative genes that can now be analysed for their compositional properties. Strong, systematic differences still exist, in several species, between the compositional properties of sets of ex novo predicted genes and genes that have been experimentally detected and/or verified. This is particularly evident in the estimated gene set (>45,000 genes) of the recently sequenced rice genome, where roughly half the predicted genes are compositionally unusual and have no known orthologues in the dicot Arabidopsis. In a few cases such differences might suggest a bias in experimental gene-finding protocols, but the quasi-random nature of the compositionally aberrant predicted genes is a strong indication that many, if not most, of them are false positives. It therefore appears that some important features of coding regions have not yet been taken into account in existing gene prediction programs. Statistical base compositional properties of curated gene data sets from vertebrates, which we briefly review here, should therefore provide a useful benchmark for fine-tuning probabilistic gene models and model parameters that are currently in use.
Collapse
|
43
|
|
44
|
Abstract
Alus and LINEs (LINE1) are widespread classes of repeats that are very unevenly distributed in the human genome. The majority of GC-poor LINEs reside in the GC-poor isochores whereas GC-rich Alus are mostly present in GC-rich isochores. The discovery that LINES and Alus share similar target site duplication and a common AT-rich insertion site specificity raised the question as to why these two families of repeats show such a different distribution in the genome. This problem was investigated here by studying the isochore distributions of subfamilies of LINES and Alus characterized by different degrees of divergence from the consensus sequences, and of Alus, LINEs and pseudogenes located on chromosomes 21 and 22. Young Alus are more frequent in the GC-poor part of the genome than old Alus. This suggests that the gradual accumulation of Alus in GC-rich isochores has occurred because of their higher stability in compositionally matching chromosomal regions. Densities of Alus and LINEs increase and decrease, respectively, with increasing GC levels, except for the telomeric regions of the analyzed chromosomes. In addition to LINEs, processed pseudogenes are also more frequent in GC-poor isochores. Finally, the present results on Alu and LINE stability/exclusion predict significant losses of Alu DNA from the GC-poor isochores during evolution, a phenomenon apparently due to negative selection against sequences that differ from the isochore composition.
Collapse
|
45
|
Abstract
In the present work we show that in the Drosophila genome (which covers a 37-51% GC range at a DNA size of approx.50kb) a linear correlation holds between GC (or GC(3)50kb) genomic sequences embedding them. This correlation allows us to position the two compositional distributions of (a) coding sequences, and (b) of long DNA segments relative to each other and to calculate gene concentration across the compositional range of the Drosophila genome. Using this approach, we show that gene concentration increases with increasing GC of the regions embedding the genes, reaching a 7-fold higher level in the GC-richest regions compared with the GC-poorest regions. The gene distribution of the Drosophila genome is, therefore, similar to (although less striking than) that of the human genome, whereas it is very different from those of the Arabidopsis genome, which has about the same size as the Drosophila genome.
Collapse
|
46
|
Erratum to: “The correlation of protein hydropathy with the base composition of coding sequences” [Gene 238 (1999) 3–14]. Gene X 2000. [DOI: 10.1016/s0378-1119(99)00504-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
|
47
|
Gene expression, amino acid conservation, and hydrophobicity are the main factors shaping codon preferences in Mycobacterium tuberculosis and Mycobacterium leprae. J Mol Evol 2000; 50:45-55. [PMID: 10654259 DOI: 10.1007/s002399910006] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mycobacterium tuberculosis and Mycobacterium leprae are the ethiological agents of tuberculosis and leprosy, respectively. After performing extensive comparisons between genes from these two GC-rich bacterial species, we were able to construct a set of 275 homologous genes. Since these two bacterial species also have a very low growth rate, translational selection could not be so determinant in their codon preferences as it is in other fast-growing bacteria. Indeed, principal-components analysis of codon usage from this set of homologous genes revealed that the codon choices in M. tuberculosis and M. leprae are correlated not only with compositional constraints and translational selection, but also with the degree of amino acid conservation and the hydrophobicity of the encoded proteins. Finally, significant correlations were found between GC3 and synonymous distances as well as between synonymous and nonsynonymous distances.
Collapse
|
48
|
Correlations of nucleotide substitution rates and base composition of mammalian coding sequences with protein structure. Gene 1999; 238:23-31. [PMID: 10570980 DOI: 10.1016/s0378-1119(99)00258-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
We investigated the relationships between the nucleotide substitution rates and the predicted secondary structures in the three states representation (alpha-helix, beta-sheet, and coil). The analysis was carried out on 34 alignments, each of which comprised sequences belonging to at least four different mammalian orders. The rates of synonymous substitution were found to be significantly different in regions predicted to be alpha-helix, beta-sheet, or coil. Likewise, the nonsynonymous rates also differ, although expectedly at a lower extent, in the three types of secondary structure, suggesting that different selective constraints associated with the different structures are affecting in a similar way the synonymous and nonsynonymous rates. Moreover, the base composition of the third codon positions is different in coding sequence regions corresponding to different secondary structures of proteins.
Collapse
|
49
|
Abstract
The "universal correlation" (D'Onofrio, G., Bernardi, G., 1992. A universal compositional correlation among codon positions. Gene 110, 81-88.) that holds between <GC3> and <GC1> or <GC2> (<GC> values are the average values of the coding sequences of each genome analyzed) at both the inter- and intra-genomic level, was re-analyzed on a vastly larger dataset. The results showed a slight, but significant, difference in the <GC3> vs. <GC1> correlations exhibited by prokaryotes and eukaryotes. This finding prompted an analysis of the correlation between <GC3> and the amino acid frequencies in the encoded proteins, which has shown that positive correlations exist between <GC3> values of coding sequences and the hydropathy of the corresponding proteins. These correlations are due to the fact that hydrophobic and amphypathic amino acids increase, whereas hydrophilic amino acids decrease with increasing <GC3> values. Hydropathy values of prokaryotic proteins are systematically higher than those of eukaryotes, but the slopes of the regression lines are identical. The lower hydrophobicity of eukaryotic proteins is due to differences in the amino acid composition. In particular, the twofold higher cysteine (and disulfide bond) level of eukaryotic proteins compared to prokaryotic proteins most probably compensates for their lower hydrophobicity. This supports the viewpoint that hydrophobicity plays a structural and functional role as far as protein stability is concerned.
Collapse
|
50
|
Abstract
A compositional transition was previously detected by comparing orthologous coding sequences from cold- and warm-blooded vertebrates (see Bernardi, G., Hughes, S., Mouchiroud, D., 1997. The major compositional transitions in the vertebrate genome. J. Mol. Evol. 44, S44-S51 for a review). The transition is characterized by higher GC levels (GC is the molar ratio of guanine+cytosine in DNA) and, especially, by higher GC3 levels (GC3 is the GC level of third codon positions) in coding sequences from warm-blooded vertebrates. This transition essentially affects GC-rich genes, although the nucleotide substitution rate is of the same order of magnitude in both GC-poor and GC-rich genes. In order to understand the evolutionary basis of the changes, we have compared the hydrophobicity of orthologous proteins from Xenopus and human. Although the differences are small in proteins encoded by coding sequences ranging from 0 to 65% in GC3, they are large in the proteins encoded by sequences characterized by GC3 values higher than 65%. The latter proteins are more hydrophobic in human than in Xenopus.
Collapse
|