1
|
Shi M, Wang Y, Lv P, Gong Y, Sha Q, Zhao X, Zhou W, Meng L, Han Z, Zhang L, Sun Y. Genome-wide characterization and expression analysis of the ADF gene family in response to salt and drought stress in alfalfa ( Medicago sativa). FRONTIERS IN PLANT SCIENCE 2025; 15:1520267. [PMID: 39949635 PMCID: PMC11821967 DOI: 10.3389/fpls.2024.1520267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 12/26/2024] [Indexed: 02/16/2025]
Abstract
The microfilament cytoskeleton, formed by the process of actin polymerization, serves not only to support the morphology of the cell, but also to regulate a number of cellular activities. Actin-depolymerizing factors (ADFs) represent a significant class of actin-binding proteins that regulate the dynamic alterations in the microfilament framework, thereby playing a pivotal role in plant growth and development. Additionally, they are instrumental in modulating stress responses in plants. The ADF gene family has been explored in various plants, but there was a paucity of knowledge regarding the ADF gene family in alfalfa (Medicago sativa), which is one of the most significant leguminous forage crops globally. In this study, a total of nine ADF genes (designated MsADF1 through MsADF9) were identified in the alfalfa genome and mapped to five different chromosomes. A phylogenetic analysis indicated that the MsADF genes could be classified into four distinct groups, with members within the same group exhibiting comparable gene structures and conserved motifs. The analysis of the Ka/Ks ratios indicated that the MsADF genes underwent purity-based selection during its evolutionary expansion. The promoter region of these genes was found to contain multiple cis-acting elements related to hormone responses, defence, and stress, indicating that they may respond to a variety of developmental and environmental stimuli. Gene expression profiles analyzed by RT-qPCR experiments demonstrated that MsADF genes exhibited distinct expression patterns among different organs. Furthermore, the majority of MsADF genes were induced by salt and drought stress by more than two-fold, with MsADF1, 2/3, 6, and 9 being highly induced, suggesting their critical role in resistance to abiotic stress. These results provide comprehensive information on the MsADF gene family in alfalfa and lay a solid foundation for elucidating their biological function.
Collapse
Affiliation(s)
- Mengmeng Shi
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Yike Wang
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Peng Lv
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Yujie Gong
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Qi Sha
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Xinyan Zhao
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Wen Zhou
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
| | - Lingtao Meng
- Shandong Binnong Technology Co., Ltd., Binzhou, China
| | - Zegang Han
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Lingxiao Zhang
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, College of Resources and Environmental Sciences, Nanjing Agricultural University, Nanjing, China
| | - Yongwang Sun
- College of Agriculture and Biology, Liaocheng University, Liaocheng, China
- Shandong Binnong Technology Co., Ltd., Binzhou, China
| |
Collapse
|
2
|
Wang Y, Tang H, Wang X, Sun Y, Joseph PV, Paterson AH. Detection of colinear blocks and synteny and evolutionary analyses based on utilization of MCScanX. Nat Protoc 2024; 19:2206-2229. [PMID: 38491145 DOI: 10.1038/s41596-024-00968-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 12/20/2023] [Indexed: 03/18/2024]
Abstract
As different taxa evolve, gene order often changes slowly enough that chromosomal 'blocks' with conserved gene orders (synteny) are discernible. The MCScanX toolkit ( https://github.com/wyp1125/MCScanX ) was published in 2012 as freely available software for the detection of such 'colinear blocks' and subsequent synteny and evolutionary analyses based on genome-wide gene location and protein sequence information. Owing to its simplicity and high efficiency for colinear block detection, MCScanX provides a powerful tool for conducting diverse synteny and evolutionary analyses. Moreover, the detection of colinear blocks has been embraced as an integral step for pangenome graph construction. Here, new application trends of MCScanX are explored, striving to better connect this increasingly used tool to other tools and accelerate insight generation from exponentially growing sequence data. We provide a detailed protocol that covers how to install MCScanX on diverse platforms, tune parameters, prepare input files from data from the National Center for Biotechnology Information, run MCScanX and its visualization and evolutionary analysis tools, and connect MCScanX with external tools, including MCScanX-transposed, Circos and SynVisio. This protocol is easily implemented by users with minimal computational background and is adaptable to new data of interest to them. The data and utility programs for this protocol can be obtained from http://bdx-consulting.com/mcscanx-protocol .
Collapse
Affiliation(s)
- Yupeng Wang
- BDX Research & Consulting LLC, Herndon, VA, USA
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA
| | - Haibao Tang
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xiyin Wang
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA
- Center for Genomics, College of Science, North China University of Science and Technology, Tangshan, China
| | - Ying Sun
- BDX Research & Consulting LLC, Herndon, VA, USA
| | - Paule V Joseph
- Section of Sensory Science and Metabolism, National Institute on Alcohol Abuse and Alcoholism, Bethesda, MD, USA.
- National Institute of Nursing Research, Bethesda, MD, USA.
| | - Andrew H Paterson
- Plant Genome Mapping Laboratory, The University of Georgia, Athens, GA, USA.
| |
Collapse
|
3
|
Neves F, Muñoz-Mérida A, Machado AM, Almeida T, Gaigher A, Esteves PJ, Castro LFC, Veríssimo A. Uncovering a 500 million year old history and evidence of pseudogenization for TLR15. Front Immunol 2022; 13:1020601. [PMID: 36605191 PMCID: PMC9808068 DOI: 10.3389/fimmu.2022.1020601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/23/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction Toll like receptors (TLRs) are at the front line of pathogen recognition and host immune response. Many TLR genes have been described to date with some being found across metazoans while others are restricted to specific lineages. A cryptic member of the TLR gene family, TLR15, has a unique phylogenetic distribution. Initially described in extant species of birds and reptiles, an ortholog has been reported for cartilaginous fish. Methods Here, we significantly expanded the evolutionary analysis of TLR15 gene evolution, taking advantage of large genomic and transcriptomic resources available from different lineages of vertebrates. Additionally, we objectively search for TLR15 in lobe-finned and ray-finned fish, as well as in cartilaginous fish and jawless vertebrates. Results and discussion We confirm the presence of TLR15 in early branching jawed vertebrates - the cartilaginous fish, as well as in basal Sarcopterygii - in lungfish. However, within cartilaginous fish, the gene is present in Holocephalans (all three families) but not in Elasmobranchs (its sister-lineage). Holocephalans have long TLR15 protein sequences that disrupt the typical TLR structure, and some species display a pseudogene sequence due to the presence of frameshift mutations and early stop codons. Additionally, TLR15 has low expression levels in holocephalans when compared with other TLR genes. In turn, lungfish also have long TLR15 protein sequences but the protein structure is not compromised. Finally, TLR15 presents several sites under negative selection. Overall, these results suggest that TLR15 is an ancient TLR gene and is experiencing ongoing pseudogenization in early-branching vertebrates.
Collapse
Affiliation(s)
- Fabiana Neves
- CIBIO‐InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal,*Correspondence: Fabiana Neves,
| | - Antonio Muñoz-Mérida
- CIBIO‐InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal,Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| | - André M. Machado
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal,CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
| | - Tereza Almeida
- CIBIO‐InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Arnaud Gaigher
- CIBIO‐InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal,Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, Plön, Germany,Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Pedro J. Esteves
- CIBIO‐InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal,Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal,CITS - Center of Investigation in Health Technologies, CESPU, Gandra, Portugal
| | - L. Filipe C. Castro
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal,CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
| | - Ana Veríssimo
- CIBIO‐InBIO, Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal,BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| |
Collapse
|
4
|
Irwin DM. Variation in the Evolution and Sequences of Proglucagon and the Receptors for Proglucagon-Derived Peptides in Mammals. Front Endocrinol (Lausanne) 2021; 12:700066. [PMID: 34322093 PMCID: PMC8312260 DOI: 10.3389/fendo.2021.700066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 06/24/2021] [Indexed: 01/12/2023] Open
Abstract
The mammalian proglucagon gene (Gcg) encodes three glucagon like sequences, glucagon, glucagon-like peptide-1 (GLP-1), and glucagon-like peptide-2 that are of similar length and share sequence similarity, with these hormones having cell surface receptors, glucagon receptor (Gcgr), GLP-1 receptor (Glp1r), and GLP-2 receptor (Glp2r), respectively. Gcgr, Glp1r, and Glp2r are all class B1 G protein-coupled receptors (GPCRs). Despite their sequence and structural similarity, analyses of sequences from rodents have found differences in patterns of sequence conservation and evolution. To determine whether these were rodent-specific traits or general features of these genes in mammals I analyzed coding and protein sequences for proglucagon and the receptors for proglucagon-derived peptides from the genomes of 168 mammalian species. Single copy genes for each gene were found in almost all genomes. In addition to glucagon sequences within Hystricognath rodents (e.g., guinea pig), glucagon sequences from a few other groups (e.g., pangolins and some bats) as well as changes in the proteolytic processing of GLP-1 in some bats are suggested to have functional effects. GLP-2 sequences display increased variability but accepted few substitutions that are predicted to have functional consequences. In parallel, Glp2r sequences display the most rapid protein sequence evolution, and show greater variability in amino acids at sites involved in ligand interaction, however most were not predicted to have a functional consequence. These observations suggest that a greater diversity in biological functions for proglucagon-derived peptides might exist in mammals.
Collapse
Affiliation(s)
- David M. Irwin
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Banting and Best Diabetes Centre, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
5
|
Holthaus KB, Eckhart L, Dalla Valle L, Alibardi L. Review: Evolution and diversification of corneous beta‐proteins, the characteristic epidermal proteins of reptiles and birds. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2019; 330:438-453. [DOI: 10.1002/jez.b.22840] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 11/28/2018] [Accepted: 12/23/2018] [Indexed: 02/04/2023]
Affiliation(s)
- Karin Brigit Holthaus
- Department of DermatologyMedical University of ViennaWien Austria
- Dipartimento di Scienze Biologiche, Geologiche ed Ambientali (BiGeA)University of BolognaBologna Italy
| | - Leopold Eckhart
- Department of DermatologyMedical University of ViennaWien Austria
| | | | - Lorenzo Alibardi
- Dipartimento di Scienze Biologiche, Geologiche ed Ambientali (BiGeA)University of BolognaBologna Italy
- Comparative Histolab PadovaPadova Italy
| |
Collapse
|
6
|
Galpert D, Fernández A, Herrera F, Antunes A, Molina-Ruiz R, Agüero-Chapin G. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers. BMC Bioinformatics 2018; 19:166. [PMID: 29724166 PMCID: PMC5934817 DOI: 10.1186/s12859-018-2148-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 04/04/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. RESULTS The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. CONCLUSIONS The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencia de la Computación, Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Alberto Fernández
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071, Granada, Spain
| | - Agostinho Antunes
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal
| | - Reinaldo Molina-Ruiz
- Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba
| | - Guillermin Agüero-Chapin
- CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Matosinhos, Porto, Portugal. .,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007, Porto, Portugal. .,Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), 54830, Santa Clara, Cuba.
| |
Collapse
|
7
|
Song H, Lin K, Hu J, Pang E. An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome. FRONTIERS IN PLANT SCIENCE 2018; 9:325. [PMID: 29599790 PMCID: PMC5863696 DOI: 10.3389/fpls.2018.00325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 02/27/2018] [Indexed: 06/08/2023]
Abstract
Background: Although the cucumber reference genome and its annotation were published several years ago, the functional annotation of predicted genes, particularly protein-coding genes, still requires further improvement. In general, accurately determining orthologous relationships between genes allows for better and more robust functional assignments of predicted genes. As one of the most reliable strategies, the determination of collinearity information may facilitate reliable orthology inferences among genes from multiple related genomes. Currently, the identification of collinear segments has mainly been based on conservation of gene order and orientation. Over the course of plant genome evolution, various evolutionary events have disrupted or distorted the order of genes along chromosomes, making it difficult to use those genes as genome-wide markers for plant genome comparisons. Results: Using the localized LASTZ/MULTIZ analysis pipeline, we aligned 15 genomes, including cucumber and other related angiosperm plants, and identified a set of genomic segments that are short in length, stable in structure, uniform in distribution and highly conserved across all 15 plants. Compared with protein-coding genes, these conserved segments were more suitable for use as genomic markers for detecting collinear segments among distantly divergent plants. Guided by this set of identified collinear genomic segments, we inferred 94,486 orthologous protein-coding gene pairs (OPPs) between cucumber and 14 other angiosperm species, which were used as proxies for transferring functional terms to cucumber genes from the annotations of the other 14 genomes. In total, 10,885 protein-coding genes were assigned Gene Ontology (GO) terms which was nearly 1,300 more than results collected in Uniprot-proteomic database. Our results showed that annotation accuracy would been improved compared with other existing approaches. Conclusions: In this study, we provided an alternative resource for the functional annotation of predicted cucumber protein-coding genes, which we expect will be beneficial for the cucumber's biological study, accessible from http://cmb.bnu.edu.cn/functional_annotation. Meanwhile, using the cucumber reference genome as a case study, we presented an efficient strategy for transferring gene functional information from previously well-characterized protein-coding genes in model species to newly sequenced or "non-model" plant species.
Collapse
Affiliation(s)
- Hongtao Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Jinglu Hu
- Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Japan
| | - Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| |
Collapse
|
8
|
Li Y, He L, Li J, Chen J, Liu C. Genome-Wide Identification, Characterization, and Expression Profiling of the Legume BZR Transcription Factor Gene Family. FRONTIERS IN PLANT SCIENCE 2018; 9:1332. [PMID: 30283468 PMCID: PMC6156370 DOI: 10.3389/fpls.2018.01332] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Accepted: 08/24/2018] [Indexed: 05/19/2023]
Abstract
The BRASSINAZOLE-RESISTANT (BZR) family of transcription factors (TFs) are positive regulators in the biosynthesis of brassinosteroids. The latter is a class of steroid hormones that affect a variety of developmental and physiological processes in plants. BZR TFs play essential roles in the regulation of plant growth and development, including multiple stress-resistance functions. However, the evolutionary history and individual expression patterns of the legume BZR genes has not been determined. In this study, we performed a genome-wide investigation of the BZR gene family in seven legume species. In total, 52 BZR genes were identified and characterized. By analyzing their phylogeny, we divided these BZR genes into five groups by comparison with orthologs/paralogs in Arabidopsis thaliana. The intron/exon structural patterns and conserved protein motifs of each gene were analyzed and showed high group-specificities. Legume BZR genes were unevenly distributed among their corresponding genomes. Genome and gene sequence comparisons revealed that gene expansion of the BZR TF family in legumes mainly resulted from segmental duplications and that this family has undergone purifying selection. Synteny analysis showed that BZR genes tended to localize within syntenic blocks conserved across legume genomes. The expression patterns of BZR genes among various legume vegetative tissues and in response to different abiotic stresses were analyzed using a combination of public transcriptome data and quantitative PCR. The patterns indicated that many BZR genes regulate legume organ development and differentiation, and significantly respond to drought and salt stresses. This study may provide valuable information for understanding the evolution of BZR gene structure and expression, and lays a foundation for future functional analysis of the legume BZR genes by species and by gene.
Collapse
|
9
|
Kirk IK, Weinhold N, Brunak S, Belling K. The impact of the protein interactome on the syntenic structure of mammalian genomes. PLoS One 2017; 12:e0179112. [PMID: 28910296 PMCID: PMC5598925 DOI: 10.1371/journal.pone.0179112] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Accepted: 05/10/2017] [Indexed: 02/06/2023] Open
Abstract
Conserved synteny denotes evolutionary preserved gene order across species. It is not well understood to which degree functional relationships between genes are preserved in syntenic blocks. Here we investigate whether protein-coding genes conserved in mammalian syntenic blocks encode gene products that serve the common functional purpose of interacting at protein level, i.e. connectivity. High connectivity among protein-protein interactions (PPIs) was only moderately associated with conserved synteny on a genome-wide scale. However, we observed a smaller subset of 3.6% of all syntenic blocks with high-confidence PPIs that had significantly higher connectivity than expected by random. Additionally, syntenic blocks with high-confidence PPIs contained significantly more chromatin loops than the remaining blocks, indicating functional preservation among these syntenic blocks. Conserved synteny is typically defined by sequence similarity. In this study, we also examined whether a functional relationship, here PPI connectivity, can identify syntenic blocks independently of orthology. While orthology-based syntenic blocks with high-confident PPIs and the connectivity-based syntenic blocks largely overlapped, the connectivity-based approach identified additional syntenic blocks that were not found by conventional sequence-based methods alone. Additionally, the connectivity-based approach enabled identification of potential orthologous genes between species. Our analyses demonstrate that subsets of syntenic blocks are associated with highly connected proteins, and that PPI connectivity can be used to detect conserved synteny even if sequence conservation drifts beyond what orthology algorithms normally can identify.
Collapse
Affiliation(s)
- Isa Kristina Kirk
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Nils Weinhold
- Memorial Sloan Kettering Cancer Center, Computational Biology Program, New York, NY, United States of America
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kirstine Belling
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|
10
|
Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
Affiliation(s)
- Fredj Tekaia
- Institut Pasteur, Unit of Structural Microbiology, CNRS URA 3528 and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| |
Collapse
|
11
|
Galpert D, del Río S, Herrera F, Ancede-Gallardo E, Antunes A, Agüero-Chapin G. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species. BIOMED RESEARCH INTERNATIONAL 2015; 2015:748681. [PMID: 26605337 PMCID: PMC4641943 DOI: 10.1155/2015/748681] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Revised: 07/26/2015] [Accepted: 08/20/2015] [Indexed: 11/17/2022]
Abstract
Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.
Collapse
Affiliation(s)
- Deborah Galpert
- Departamento de Ciencias de la Computación, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
| | - Sara del Río
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
| | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, Research Center on Information and Communications Technology (CITIC-UGR), University of Granada, 18071 Granada, Spain
| | - Evys Ancede-Gallardo
- Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
| | - Agostinho Antunes
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIMAR/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Guillermin Agüero-Chapin
- Centro de Bioactivos Químicos, Universidad Central “Marta Abreu” de Las Villas (UCLV), 54830 Santa Clara, Cuba
- Centro Interdisciplinar de Investigação Marinha e Ambiental (CIMAR/CIIMAR), Universidade do Porto, Rua dos Bragas 177, 4050-123 Porto, Portugal
| |
Collapse
|
12
|
Lacroix T, Loux V, Gendrault A, Hoebeke M, Gibrat JF. Insyght: navigating amongst abundant homologues, syntenies and gene functional annotations in bacteria, it's that symbol! Nucleic Acids Res 2014; 42:gku867. [PMID: 25249626 PMCID: PMC4245967 DOI: 10.1093/nar/gku867] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 08/28/2014] [Accepted: 09/10/2014] [Indexed: 11/14/2022] Open
Abstract
High-throughput techniques have considerably increased the potential of comparative genomics whilst simultaneously posing many new challenges. One of those challenges involves efficiently mining the large amount of data produced and exploring the landscape of both conserved and idiosyncratic genomic regions across multiple genomes. Domains of application of these analyses are diverse: identification of evolutionary events, inference of gene functions, detection of niche-specific genes or phylogenetic profiling. Insyght is a comparative genomic visualization tool that combines three complementary displays: (i) a table for thoroughly browsing amongst homologues, (ii) a comparator of orthologue functional annotations and (iii) a genomic organization view designed to improve the legibility of rearrangements and distinctive loci. The latter display combines symbolic and proportional graphical paradigms. Synchronized navigation across multiple species and interoperability between the views are core features of Insyght. A gene filter mechanism is provided that helps the user to build a biologically relevant gene set according to multiple criteria such as presence/absence of homologues and/or various annotations. We illustrate the use of Insyght with scenarios. Currently, only Bacteria and Archaea are supported. A public instance is available at http://genome.jouy.inra.fr/Insyght. The tool is freely downloadable for private data set analysis.
Collapse
Affiliation(s)
- Thomas Lacroix
- INRA, UR 1077 Mathématique Informatique et Génome, 78352 Jouy-en-Josas, France
| | - Valentin Loux
- INRA, UR 1077 Mathématique Informatique et Génome, 78352 Jouy-en-Josas, France
| | - Annie Gendrault
- INRA, UR 1077 Mathématique Informatique et Génome, 78352 Jouy-en-Josas, France
| | - Mark Hoebeke
- CNRS, UPMC, FR2424, ABiMS, Station Biologique, 29680 Roscoff, France
| | | |
Collapse
|
13
|
Lemmon EM, Lemmon AR. High-Throughput Genomic Data in Systematics and Phylogenetics. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2013. [DOI: 10.1146/annurev-ecolsys-110512-135822] [Citation(s) in RCA: 355] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, Biomedical Research Facility, Tallahassee, Florida 32306;
| | - Alan R. Lemmon
- Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, Florida 32306;
| |
Collapse
|
14
|
Zuriaga E, Soriano JM, Zhebentyayeva T, Romero C, Dardick C, Cañizares J, Badenes ML. Genomic analysis reveals MATH gene(s) as candidate(s) for Plum pox virus (PPV) resistance in apricot (Prunus armeniaca L.). MOLECULAR PLANT PATHOLOGY 2013; 14:663-77. [PMID: 23672686 PMCID: PMC6638718 DOI: 10.1111/mpp.12037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Sharka disease, caused by Plum pox virus (PPV), is the most important viral disease affecting Prunus species. A major PPV resistance locus (PPVres) has been mapped to the upper part of apricot (Prunus armeniaca) linkage group 1. In this study, a physical map of the PPVres locus in the PPV-resistant cultivar 'Goldrich' was constructed. Bacterial artificial chromosome (BAC) clones belonging to the resistant haplotype contig were sequenced using 454/GS-FLX Titanium technology. Concurrently, the whole genome of seven apricot varieties (three PPV-resistant and four PPV-susceptible) and two PPV-susceptible apricot relatives (P. sibirica var. davidiana and P. mume) were obtained using the Illumina-HiSeq2000 platform. Single nucleotide polymorphisms (SNPs) within the mapped interval, recorded from alignments against the peach genome, allowed us to narrow down the PPVres locus to a region of ∼196 kb. Searches for polymorphisms linked in coupling with the resistance led to the identification of 68 variants within 23 predicted transcripts according to peach genome annotation. Candidate resistance genes were ranked combining data from variant calling and predicted functions inferred from sequence homology. Together, the results suggest that members of a cluster of meprin and TRAF-C homology domain (MATHd)-containing proteins are the most likely candidate genes for PPV resistance in apricot. Interestingly, MATHd proteins are hypothesized to control long-distance movement (LDM) of potyviruses in Arabidopsis, and restriction for LDM is also a major component of PPV resistance in apricot. Although the PPV resistance gene(s) remains to be unambiguously identified, these results pave the way to the determination of the underlying mechanism and to the development of more accurate breeding strategies.
Collapse
Affiliation(s)
- Elena Zuriaga
- Instituto Valenciano de Investigaciones Agrarias (IVIA), Apartado Oficial, 46113 Moncada, Valencia, Spain
| | | | | | | | | | | | | |
Collapse
|
15
|
An S-locus independent pollen factor confers self-compatibility in 'Katy' apricot. PLoS One 2013; 8:e53947. [PMID: 23342044 PMCID: PMC3544744 DOI: 10.1371/journal.pone.0053947] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 12/06/2012] [Indexed: 11/19/2022] Open
Abstract
Loss of pollen-S function in Prunus self-compatible cultivars has been mostly associated with deletions or insertions in the S-haplotype-specific F-box (SFB) genes. However, self-compatible pollen-part mutants defective for non-S-locus factors have also been found, for instance, in the apricot (Prunus armeniaca) cv. ‘Canino’. In the present study, we report the genetic and molecular analysis of another self-compatible apricot cv. termed ‘Katy’. S-genotype of ‘Katy’ was determined as S1S2 and S-RNase PCR-typing of selfing and outcrossing populations from ‘Katy’ showed that pollen gametes bearing either the S1- or the S2-haplotype were able to overcome self-incompatibility (SI) barriers. Sequence analyses showed no SNP or indel affecting the SFB1 and SFB2 alleles from ‘Katy’ and, moreover, no evidence of pollen-S duplication was found. As a whole, the obtained results are compatible with the hypothesis that the loss-of-function of a S-locus unlinked factor gametophytically expressed in pollen (M’-locus) leads to SI breakdown in ‘Katy’. A mapping strategy based on segregation distortion loci mapped the M’-locus within an interval of 9.4 cM at the distal end of chr.3 corresponding to ∼1.29 Mb in the peach (Prunus persica) genome. Interestingly, pollen-part mutations (PPMs) causing self-compatibility (SC) in the apricot cvs. ‘Canino’ and ‘Katy’ are located within an overlapping region of ∼273 Kb in chr.3. No evidence is yet available to discern if they affect the same gene or not, but molecular markers seem to indicate that both cultivars are genetically unrelated suggesting that every PPM may have arisen independently. Further research will be necessary to reveal the precise nature of ‘Katy’ PPM, but fine-mapping already enables SC marker-assisted selection and paves the way for future positional cloning of the underlying gene.
Collapse
|
16
|
Ahn D, You KH, Kim CH. Evolution of the tbx6/16 subfamily genes in vertebrates: insights from zebrafish. Mol Biol Evol 2012; 29:3959-83. [PMID: 22915831 DOI: 10.1093/molbev/mss199] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
In any comparative studies striving to understand the similarities and differences of the living organisms at the molecular genetic level, the crucial first step is to establish the homology (orthology and paralogy) of genes between different organisms. Determination of the homology of genes becomes complicated when the genes have undergone a rapid divergence in sequence or when the involved genes are members of a gene family that has experienced a differential gain or loss of its constituents in different taxonomic groups. Organisms with duplicated genomes such as teleost fishes might have been especially prone to these problems because the functional redundancies provided by the duplicate copies of genes would have allowed a rapid divergence or loss of genes during evolution. In this study, we will demonstrate that much of the ambiguities in the determination of the homology between fish and tetrapod genes resulting from the problems like these can be eliminated by complementing the sequence-based phylogenies with nonsequence information, such as the exon-intron structure of a gene or the composition of a gene's genomic neighbors. We will use the Tbx6/16 subfamily genes of zebrafish (tbx6, tbx16, tbx24, and mga genes), which have been well known for the ambiguity of their evolutionary relationships to the Tbx6/16 subfamily genes of tetrapods, as an illustrative example. We will show that, despite the similarity of sequence and expression to the tetrapod Tbx6 genes, zebrafish tbx6 gene is actually a novel T-box gene more closely related to the tetrapod Tbx16 genes, whereas the zebrafish tbx24 gene, hitherto considered to be a novel gene due to the high level of sequence divergence, is actually an ortholog of tetrapod Tbx6 genes. We will also show that, after their initial appearance by the multiplication of a common ancestral gene at the beginning of vertebrate evolution, the Tbx6/16 subfamily of vertebrate T-box genes might have experienced differential losses of member genes in different vertebrate groups and gradual pooling of member gene's functions in surviving members, which might have prevented the revelation of the true identity of member genes by way of the comparison of sequence and function.
Collapse
Affiliation(s)
- Daegwon Ahn
- Department of Biology, Chungnam National University, Daejeon, Republic of Korea
| | | | | |
Collapse
|
17
|
Zuriaga E, Molina L, Badenes ML, Romero C. Physical mapping of a pollen modifier locus controlling self-incompatibility in apricot and synteny analysis within the Rosaceae. PLANT MOLECULAR BIOLOGY 2012; 79:229-242. [PMID: 22481163 DOI: 10.1007/s11103-012-9908-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Accepted: 03/23/2012] [Indexed: 05/31/2023]
Abstract
S-locus products (S-RNase and F-box proteins) are essential for the gametophytic self-incompatibility (GSI) specific recognition in Prunus. However, accumulated genetic evidence suggests that other S-locus unlinked factors are also required for GSI. For instance, GSI breakdown was associated with a pollen-part mutation unlinked to the S-locus in the apricot (Prunus armeniaca L.) cv. 'Canino'. Fine-mapping of this mutated modifier gene (M-locus) and the synteny analysis of the M-locus within the Rosaceae are here reported. A segregation distortion loci mapping strategy, based on a selectively genotyped population, was used to map the M-locus. In addition, a bacterial artificial chromosome (BAC) contig was constructed for this region using overlapping oligonucleotides probes, and BAC-end sequences (BES) were blasted against Rosaceae genomes to perform micro-synteny analysis. The M-locus was mapped to the distal part of chr.3 flanked by two SSR markers within an interval of 1.8 cM corresponding to ~364 Kb in the peach (Prunus persica L. Batsch) genome. In the integrated genetic-physical map of this region, BES were mapped against the peach scaffold_3 and BACs were anchored to the apricot map. Micro-syntenic blocks were detected in apple (Malus × domestica Borkh.) LG17/9 and strawberry (Fragaria vesca L.) FG6 chromosomes. The M-locus fine-scale mapping provides a solid basis for self-compatibility marker-assisted selection and for positional cloning of the underlying gene, a necessary goal to elucidate the pollen rejection mechanism in Prunus. In a wider context, the syntenic regions identified in peach, apple and strawberry might be useful to interpret GSI evolution in Rosaceae.
Collapse
Affiliation(s)
- Elena Zuriaga
- Instituto Valenciano de Investigaciones Agrarias-IVIA, Apartado Oficial, 46113 Moncada, Valencia, Spain.
| | | | | | | |
Collapse
|
18
|
Peralta H, Guerrero G, Aguilar A, Mora J. Sequence variability of Rhizobiales orthologs and relationship with physico-chemical characteristics of proteins. Biol Direct 2011; 6:48. [PMID: 21970442 PMCID: PMC3198989 DOI: 10.1186/1745-6150-6-48] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Accepted: 10/04/2011] [Indexed: 12/03/2022] Open
Abstract
Background Chromosomal orthologs can reveal the shared ancestral gene set and their evolutionary trends. Additionally, physico-chemical properties of encoded proteins could provide information about functional adaptation and ecological niche requirements. Results We analyzed 7080 genes (five groups of 1416 orthologs each) from Rhizobiales species (S. meliloti, R. etli, and M. loti, plant symbionts; A. tumefaciens, a plant pathogen; and B. melitensis, an animal pathogen). We evaluated their phylogenetic relationships and observed three main topologies. The first, with closer association of R. etli to A. tumefaciens; the second with R. etli closer to S. meliloti; and the third with A. tumefaciens and S. meliloti as the closest pair. This was not unusual, given the close relatedness of these three species. We calculated the synonymous (dS) and nonsynonymous (dN) substitution rates of these orthologs, and found that informational and metabolic functions showed relatively low dN rates; in contrast, genes from hypothetical functions and cellular processes showed high dN rates. An alternative measure of sequence variability, percentage of changes by species, was used to evaluate the most specific proportion of amino acid residues from alignments. When dN was compared with that measure a high correlation was obtained, revealing that much of evolutive information was extracted with the percentage of changes by species at the amino acid level. By analyzing the sequence variability of orthologs with a set of five properties (polarity, electrostatic charge, formation of secondary structures, molecular volume, and amino acid composition), we found that physico-chemical characteristics of proteins correlated with specific functional roles, and association of species did not follow their typical phylogeny, probably reflecting more adaptation to their life styles and niche preferences. In addition, orthologs with low dN rates had residues with more positive values of polarity, volume and electrostatic charge. Conclusions These findings revealed that even when orthologs perform the same function in each genomic background, their sequences reveal important evolutionary tendencies and differences related to adaptation. This article was reviewed by: Dr. Purificación López-García, Prof. Jeffrey Townsend (nominated by Dr. J. Peter Gogarten), and Ms. Olga Kamneva.
Collapse
Affiliation(s)
- Humberto Peralta
- Programa de Genómica Funcional de Procariotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Apdo, postal 565-A, Cuernavaca, Morelos, México
| | | | | | | |
Collapse
|
19
|
Dewey CN. Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform 2011; 12:401-12. [PMID: 21705766 PMCID: PMC3178058 DOI: 10.1093/bib/bbr040] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, 5785 Medical Sciences Center, 1300 University Ave, Madison, WI 53706, USA.
| |
Collapse
|
20
|
Jun J, Mandoiu II, Nelson CE. Identification of mammalian orthologs using local synteny. BMC Genomics 2009; 10:630. [PMID: 20030836 PMCID: PMC2807883 DOI: 10.1186/1471-2164-10-630] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 12/23/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals. RESULTS We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements. CONCLUSIONS By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.
Collapse
Affiliation(s)
- Jin Jun
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269, USA
| | | | | |
Collapse
|
21
|
Dong X, Fredman D, Lenhard B. Synorth: exploring the evolution of synteny and long-range regulatory interactions in vertebrate genomes. Genome Biol 2009; 10:R86. [PMID: 19698106 PMCID: PMC2745767 DOI: 10.1186/gb-2009-10-8-r86] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2009] [Revised: 06/22/2009] [Accepted: 08/21/2009] [Indexed: 12/17/2022] Open
Abstract
Synorth is a web resource for exploring and categorizing the syntenic relationships in gene regulatory blocks across multiple genomes. Genomic regulatory blocks are chromosomal regions spanned by long clusters of highly conserved noncoding elements devoted to long-range regulation of developmental genes, often immobilizing other, unrelated genes into long-lasting syntenic arrangements. Synorth is a web resource for exploring and categorizing the syntenic relationships in genomic regulatory blocks across multiple genomes, tracing their evolutionary fate after teleost whole genome duplication at the level of genomic regulatory block loci, individual genes, and their phylogenetic context.
Collapse
Affiliation(s)
- Xianjun Dong
- Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway.
| | | | | |
Collapse
|
22
|
Hachiya T, Osana Y, Popendorf K, Sakakibara Y. Accurate identification of orthologous segments among multiple genomes. Bioinformatics 2009; 25:853-60. [DOI: 10.1093/bioinformatics/btp070] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
23
|
Orsi CH, Tanksley SD. Natural variation in an ABC transporter gene associated with seed size evolution in tomato species. PLoS Genet 2009; 5:e1000347. [PMID: 19165318 PMCID: PMC2617763 DOI: 10.1371/journal.pgen.1000347] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 12/17/2008] [Indexed: 01/23/2023] Open
Abstract
Seed size is a key determinant of evolutionary fitness in plants and is a trait that often undergoes tremendous changes during crop domestication. Seed size is most often quantitatively inherited, and it has been shown that Sw4.1 is one of the most significant quantitative trait loci (QTLs) underlying the evolution of seed size in the genus Solanum—especially in species related to the cultivated tomato. Using a combination of genetic, developmental, molecular, and transgenic techniques, we have pinpointed the cause of the Sw4.1 QTL to a gene encoding an ABC transporter gene. This gene exerts its control on seed size, not through the maternal plant, but rather via gene expression in the developing zygote. Phenotypic effects of allelic variation at Sw4.1 are manifested early in seed development at stages corresponding to the rapid deposition of starch and lipids into the endospermic cells. Through synteny, we have identified the Arabidopsis Sw4.1 ortholog. Mutagenesis has revealed that this ortholog is associated with seed length variation and fatty acid deposition in seeds, raising the possibility that the ABC transporter may modulate seed size variation in other species. Transcription studies show that the ABC transporter gene is expressed not only in seeds, but also in other tissues (leaves and roots) and, thus, may perform functions in parts of the plants other than developing seeds. Cloning and characterization of the Sw4.1 QTL gives new insight into how plants change seed during evolution and may open future opportunities for modulating seed size in crop plants for human purposes. Given fixed resources, plants have a choice whether to produce many small seeds or a few large seeds. In terms of reproductive fitness, there are costs and benefits to both strategies. As a result, plant species vary more than 100,000-fold in both seed size and seed output. The current study focuses on understanding the molecular and developmental basis of a single genetic locus (or quantitative trait locus) that determines seed size between the cultivated tomato and its wild relatives. We show that the cause of size variation can be traced to a gene encoding an ABC transporter protein. The gene apparently exercises its control on seed size through expression in the developing seeds and not the mother plant that nurtures those seeds. A comparison with the model plant Arabidopsis thaliana suggests that the ABC transporter identified in tomato may also control seed size in other plants, opening research opportunities for understanding plant adaptation and for potentially modulating seed size in crop plants for human purposes.
Collapse
Affiliation(s)
- Cintia Hotta Orsi
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Steven D. Tanksley
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
- Department of Plant Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| |
Collapse
|
24
|
Identification of reptilian genes encoding hair keratin-like proteins suggests a new scenario for the evolutionary origin of hair. Proc Natl Acad Sci U S A 2008; 105:18419-23. [PMID: 19001262 DOI: 10.1073/pnas.0805154105] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The appearance of hair is one of the main evolutionary innovations in the amniote lineage leading to mammals. The main components of mammalian hair are cysteine-rich type I and type II keratins, also known as hard alpha-keratins or "hair keratins." To determine the evolutionary history of these important structural proteins, we compared the genomic loci of the human hair keratin genes with the homologous loci of the chicken and of the green anole lizard Anolis carolinenis. The genome of the chicken contained one type II hair keratin-like gene, and the lizard genome contained two type I and four type II hair keratin-like genes. Orthology of the latter genes and mammalian hair keratins was supported by gene locus synteny, conserved exon-intron organization, and amino acid sequence similarity of the encoded proteins. The lizard hair keratin-like genes were expressed most strongly in the digits, indicating a role in claw formation. In addition, we identified a novel group of reptilian cysteine-rich type I keratins that lack homologues in mammals. Our data show that cysteine-rich alpha-keratins are not restricted to mammals and suggest that the evolution of mammalian hair involved the co-option of pre-existing structural proteins.
Collapse
|
25
|
Lehmann J, Stadler PF, Prohaska SJ. SynBlast: assisting the analysis of conserved synteny information. BMC Bioinformatics 2008; 9:351. [PMID: 18721485 PMCID: PMC2543028 DOI: 10.1186/1471-2105-9-351] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 08/24/2008] [Indexed: 01/06/2023] Open
Abstract
Motivation In the last years more than 20 vertebrate genomes have been sequenced, and the rate at which genomic DNA information becomes available is rapidly accelerating. Gene duplication and gene loss events inherently limit the accuracy of orthology detection based on sequence similarity alone. Fully automated methods for orthology annotation do exist but often fail to identify individual members in cases of large gene families, or to distinguish missing data from traceable gene losses. This situation can be improved in many cases by including conserved synteny information. Results Here we present the SynBlast pipeline that is designed to construct and evaluate local synteny information. SynBlast uses the genomic region around a focal reference gene to retrieve candidates for homologous regions from a collection of target genomes and ranks them in accord with the available evidence for homology. The pipeline is intended as a tool to aid high quality manual annotation in particular in those cases where automatic procedures fail. We demonstrate how SynBlast is applied to retrieving orthologous and paralogous clusters using the vertebrate Hox and ParaHox clusters as examples. Software The SynBlast package written in Perl is available under the GNU General Public License at .
Collapse
Affiliation(s)
- Jörg Lehmann
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.
| | | | | |
Collapse
|
26
|
De Grassi A, Lanave C, Saccone C. Genome duplication and gene-family evolution: the case of three OXPHOS gene families. Gene 2008; 421:1-6. [PMID: 18573316 DOI: 10.1016/j.gene.2008.05.011] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2007] [Revised: 05/15/2008] [Accepted: 05/21/2008] [Indexed: 10/22/2022]
Abstract
DNA duplication is one of the main forces acting on the evolution of organisms because it creates the raw genetic material that natural selection can subsequently modify. Duplicated regions are mainly due to "errors" in different phases of meiosis, but DNA transposable elements and reverse transcription also contribute to amplify and move the genomic material to different genomic locations. As a result, redundancy affects genomes to variable degrees: from the single gene to the whole genome (WGD). Gene families are clusters of genes created by duplication and their size reflects the number of duplicated genes, called paralogs, in each species. The aim of this review is to describe the state of the art in the identification and analysis of gene families in eukaryotes, with specific attention to those generated by ancient large scale events in vertebrates (WGD or large segmental duplications). As a case study, we report our work on the evolution of gene families encoding subunits of the five OXPHOS (oxidative phosphorylation) complexes, fundamental and highly conserved in all respiring cells. Although OXPHOS gene families are smaller than the general trend in nuclear gene families, some exceptions are observed, such as three gene families with at least two paralogs in vertebrates. These gene families encode cytochrome c (Cyt c, the electron shuttle protein between complex III and IV), Lipid Binding Protein (LBP, the channel protein of complex V which transfers protons through the inner mitochondrial membrane) and the MLRQ subunit (MLRQ, a supernumerary subunit of the large complex I, with unknown function). We provide a two-step approach, based on structural genomic data, to demonstrate that these gene families should have arisen through WGD (or large segmental duplication) events at the origin of vertebrates and, only afterwards, underwent species-specific events of further gene duplications and loss. In summary, this review reflects the need to apply genome comparative approaches, deriving from both "classical" molecular phylogenetic analysis and "new" genome map analysis, to successfully define the complex evolutionary relations between gene family members which, in turn, are essential to obtain any other comparative phylogenetic or functional results.
Collapse
Affiliation(s)
- Anna De Grassi
- Istituto di Tecnologie Biomediche, Sede di Bari, CNR, Bari, Italy
| | | | | |
Collapse
|
27
|
Abstract
In recent years, it has become clear that all of the organisms on the Earth are related to each other in ways that can be documented by molecular sequence comparison. In this review, we focus on the evolutionary relationships among the proteins of the eukaryotes, especially those that allow inference of function from one species to another. Data and illustrations are derived from specific comparison of eight species: Homo sapiens, Mus musculus, Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Saccharomyces cerevisiae, and Plasmodium falciparum.
Collapse
Affiliation(s)
- Kara Dolinski
- Department of Molecular Biology, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| | | |
Collapse
|
28
|
Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. MSOAR: a high-throughput ortholog assignment system based on genome rearrangement. J Comput Biol 2008; 14:1160-75. [PMID: 17990975 DOI: 10.1089/cmb.2007.0048] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to human and mouse genomes. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. In comparison to the iterated exemplar algorithm on simulated data, MSOAR performed favorably in terms of assignment accuracy. We also validated our predicted main ortholog pairs between human and mouse using public ortholog assignment datasets, synteny information, and gene function classification. These test results indicate that our approach is very promising for genome-wide ortholog assignment. Supplemental material and MSOAR program are available at http://msoar.cs.ucr.edu.
Collapse
Affiliation(s)
- Zheng Fu
- Department of Computer Science and Engineering, University of California, Riverside, California 92521, USA.
| | | | | | | | | | | |
Collapse
|
29
|
Hu M, Choi K, Su W, Kim S, Yang J. A gene pattern mining algorithm using interchangeable gene sets for prokaryotes. BMC Bioinformatics 2008; 9:124. [PMID: 18302784 PMCID: PMC2279103 DOI: 10.1186/1471-2105-9-124] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2007] [Accepted: 02/26/2008] [Indexed: 11/27/2022] Open
Abstract
Background Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. Results In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable), we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH) technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. Conclusion The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.
Collapse
Affiliation(s)
- Meng Hu
- EECS, Case Western Reserve University, Cleveland, OH 44106 USA.
| | | | | | | | | |
Collapse
|
30
|
Kullberg M, Hallström B, Arnason U, Janke A. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution. PLoS One 2007; 2:e775. [PMID: 17712423 PMCID: PMC1942079 DOI: 10.1371/journal.pone.0000775] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2007] [Accepted: 07/24/2007] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND We investigate the usefulness of expressed sequence tags, ESTs, for establishing divergences within the tree of placental mammals. This is done on the example of the established relationships among primates (human), lagomorphs (rabbit), rodents (rat and mouse), artiodactyls (cow), carnivorans (dog) and proboscideans (elephant). METHODOLOGY/PRINCIPAL FINDINGS We have produced 2000 ESTs (1.2 mega bases) from a marsupial mouse and characterized the data for their use in phylogenetic analysis. The sequences were used to identify putative orthologous sequences from whole genome projects. Although most ESTs stem from single sequence reads, the frequency of potential sequencing errors was found to be lower than allelic variation. Most of the sequences represented slowly evolving housekeeping-type genes, with an average amino acid distance of 6.6% between human and mouse. Positive Darwinian selection was identified at only a few single sites. Phylogenetic analyses of the EST data yielded trees that were consistent with those established from whole genome projects. CONCLUSIONS The general quality of EST sequences and the general absence of positive selection in these sequences make ESTs an attractive tool for phylogenetic analysis. The EST approach allows, at reasonable costs, a fast extension of data sampling from species outside the genome projects.
Collapse
Affiliation(s)
- Morgan Kullberg
- Department of Cell and Organism Biology, Division of Evolutionary Molecular Systematics, University of Lund, Lund, Sweden.
| | | | | | | |
Collapse
|
31
|
Derrien T, André C, Galibert F, Hitte C. Analysis of the unassembled part of the dog genome sequence: chromosomal localization of 115 genes inferred from multispecies comparative genomics. ACTA ACUST UNITED AC 2007; 98:461-7. [PMID: 17573383 DOI: 10.1093/jhered/esm027] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The identification of dog genes and their accurate localization to chromosomes remain a major challenge in the postgenomics era. The 132 annotated canine genes with human orthologs remaining in the unassembled part (chrUnknown) of the dog sequence assembly (CanFam1) are of limited use for candidate gene approaches or comparative mapping studies. We used a two-step comparative analysis to infer a canine chromosomal interval for localization of the chrUn genes. We first constructed a human-dog synteny map, using 14,456 gene-based comparative anchors. We then mapped the 132 chrUn genes onto the reference (human) synteny map and identified the corresponding, orthologous segment on the canine map, based on conserved gene order. Our results show that 110 chrUn genes could be localized to short intervals on 18 dog chromosomes, whereas 22 genes remained assigned to 2 possible intervals. We extended this comparative analysis to multiple species, using the chimpanzee, mouse, and rat genome sequences. This made it possible to narrow down the intervals concerned and to increase the number of canine chrUn genes with an inferred chromosome location to 115. This study demonstrates that dog chromosomal intervals for chrUn genes can be rapidly inferred, using a reference species, and indicates that comparative strategies based on larger numbers of species may be even more effective.
Collapse
Affiliation(s)
- Thomas Derrien
- CNRS UMR6061 Génétique et Développement, Université de Rennes 1, IFR140, 2 Av du Pr Léon Bernard, CS 34317, 35043, Rennes, France
| | | | | | | |
Collapse
|
32
|
Fulton DL, Li YY, Laird MR, Horsman BGS, Roche FM, Brinkman FSL. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 2006; 7:270. [PMID: 16729895 PMCID: PMC1524997 DOI: 10.1186/1471-2105-7-270] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Accepted: 05/28/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Orthologs (genes that have diverged after a speciation event) tend to have similar function, and so their prediction has become an important component of comparative genomics and genome annotation. The gold standard phylogenetic analysis approach of comparing available organismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis; therefore, ortholog prediction for large genome-scale datasets is typically performed using a reciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectly predict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. In addition, there is an increasing interest in identifying orthologs most likely to have retained similar function. RESULTS To address these issues, we present here a high-throughput computational method named Ortholuge that further evaluates previously predicted orthologs (including those predicted using an RBH-based approach) - identifying which orthologs most closely reflect species divergence and may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involving two comparison species and an outgroup species, noting cases where relative gene divergence is atypical. It also identifies some cases of gene duplication after species divergence. Through simulations of incomplete genome data/gene loss, we show that the vast majority of genes falsely predicted as orthologs by an RBH-based method can be identified. Ortholuge was then used to estimate the number of false-positives (predominantly paralogs) in selected RBH-predicted ortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-rat comparison) and 5% in a bacterial data set (Pseudomonas putida - Pseudomonas syringae species comparison). Higher quality (more precise) datasets of orthologs, which we term "ssd-orthologs" (supporting-species-divergence-orthologs), were also constructed. These datasets, as well as Ortholuge software that may be used to characterize other species' datasets, are available at http://www.pathogenomics.ca/ortholuge/ (software under GNU General Public License). CONCLUSION The Ortholuge method reported here appears to significantly improve the specificity (precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. This method, and its associated software, will aid those performing various comparative genomics-based analyses, such as the prediction of conserved regulatory elements upstream of orthologous genes.
Collapse
Affiliation(s)
- Debra L Fulton
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Yvonne Y Li
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Canada's Michael Smith Genome Sciences Centre, 570 W. 7th Avenue, Vancouver, BC, Canada
| | - Matthew R Laird
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Benjamin GS Horsman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Fiona M Roche
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Fiona SL Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
33
|
Nolan MA, Wu L, Bang HJ, Jelinsky SA, Roberts KP, Turner TT, Kopf GS, Johnston DS. Identification of rat cysteine-rich secretory protein 4 (Crisp4) as the ortholog to human CRISP1 and mouse Crisp4. Biol Reprod 2006; 74:984-91. [PMID: 16467491 DOI: 10.1095/biolreprod.105.048298] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022] Open
Abstract
Cysteine-rich secretory proteins (CRISPs) are present in a diverse population of organisms and are defined by 16 conserved cysteine residues spanning a plant pathogenesis related-1 and a C-terminal cysteine-rich domain. To date, the diversification of mammalian CRISPs is evidenced by the existence of two, three, and four paralogous genes in the rat, human, and mouse, respectively. The current study identifies a third rat Crisp paralog we term Crisp4. The gene for Crisp4 is on rat chromosome 9 within 1 Mb of both the Crisp1 and Crisp2 genes. The full-length transcript for this gene was cloned from rat epididymal RNA and encodes a protein that shares 69% and 91% similarity with human CRISP1 and mouse CRISP4, respectively. Expression of rat Crisp4 is most abundant in the epididymis, with the highest levels of transcription observed in the caput and corpus epididymis. In contrast, rat CRISP4 protein is most abundant in the corpus and cauda regions of the epididymis. Rat CRISP4 protein is also present in caudal sperm extracts, appearing as a detergent-soluble form at the predicted MWR (26 kDa). Our data identify rat Crisp4 as the true ortholog to human CRISP1 and mouse Crisp4, and demonstrate its interaction with spermatozoa in the epididymis.
Collapse
Affiliation(s)
- Michael A Nolan
- Contraception, Women's Health and Musculoskeletal Biology, Wyeth Research, Collegeville, Pennsylvania 19426, USA.
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Philippe H, Delsuc F, Brinkmann H, Lartillot N. Phylogenomics. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2005. [DOI: 10.1146/annurev.ecolsys.35.112202.130205] [Citation(s) in RCA: 264] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hervé Philippe
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada; , ,
| | - Frédéric Delsuc
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada; , ,
| | - Henner Brinkmann
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec H3C3J7, Canada; , ,
| | - Nicolas Lartillot
- Laboratoire d'Informatique, de Robotique et de Mathématiques de Montpellier, Centre National de la Recherche Scientifique, Université de Montpellier, 34392 Montpellier Cedex 5, France;
| |
Collapse
|
35
|
Guerrero G, Peralta H, Aguilar A, Díaz R, Villalobos MA, Medrano-Soto A, Mora J. Evolutionary, structural and functional relationships revealed by comparative analysis of syntenic genes in Rhizobiales. BMC Evol Biol 2005; 5:55. [PMID: 16229745 PMCID: PMC1276791 DOI: 10.1186/1471-2148-5-55] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2005] [Accepted: 10/17/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Comparative genomics has provided valuable insights into the nature of gene sequence variation and chromosomal organization of closely related bacterial species. However, questions about the biological significance of gene order conservation, or synteny, remain open. Moreover, few comprehensive studies have been reported for rhizobial genomes. RESULTS We analyzed the genomic sequences of four fast growing Rhizobiales (Sinorhizobium meliloti, Agrobacterium tumefaciens, Mesorhizobium loti and Brucella melitensis). We made a comprehensive gene classification to define chromosomal orthologs, genes with homologs in other replicons such as plasmids, and those which were species-specific. About two thousand genes were predicted to be orthologs in each chromosome and about 80% of these were syntenic. A striking gene colinearity was found in pairs of organisms and a large fraction of the microsyntenic regions and operons were similar. Syntenic products showed higher identity levels than non-syntenic ones, suggesting a resistance to sequence variation due to functional constraints; also, an unusually high fraction of syntenic products contained membranal segments. Syntenic genes encode a high proportion of essential cell functions, presented a high level of functional relationships and a very low horizontal gene transfer rate. The sequence variability of the proteins can be considered the species signature in response to specific niche adaptation. Comparatively, an analysis with genomes of Enterobacteriales showed a different gene organization but gave similar results in the synteny conservation, essential role of syntenic genes and higher functional linkage among the genes of the microsyntenic regions. CONCLUSION Syntenic bacterial genes represent a commonly evolved group. They not only reveal the core chromosomal segments present in the last common ancestor and determine the metabolic characteristics shared by these microorganisms, but also show resistance to sequence variation and rearrangement, possibly due to their essential character. In Rhizobiales and Enterobacteriales, syntenic genes encode a high proportion of essential cell functions and presented a high level of functional relationships.
Collapse
Affiliation(s)
- Gabriela Guerrero
- Program of Functional Genomics of Prokaryotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| | - Humberto Peralta
- Program of Functional Genomics of Prokaryotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| | - Alejandro Aguilar
- Program of Functional Genomics of Prokaryotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| | - Rafael Díaz
- Program of Functional Genomics of Prokaryotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| | - Miguel Angel Villalobos
- Program of Functional Genomics of Prokaryotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| | - Arturo Medrano-Soto
- Program of Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| | - Jaime Mora
- Program of Functional Genomics of Prokaryotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México. Ave. Universidad s/n (P.O. Box 565-A), Cuernavaca, Morelos, 62210, México
| |
Collapse
|
36
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|