1
|
A chromosome-level, haplotype-phased Vanilla planifolia genome highlights the challenge of partial endoreplication for accurate whole-genome assembly. PLANT COMMUNICATIONS 2022; 3:100330. [PMID: 35617961 PMCID: PMC9482989 DOI: 10.1016/j.xplc.2022.100330] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 04/10/2022] [Accepted: 04/27/2022] [Indexed: 06/02/2023]
Abstract
Vanilla planifolia, the species cultivated to produce one of the world's most popular flavors, is highly prone to partial genome endoreplication, which leads to highly unbalanced DNA content in cells. We report here the first molecular evidence of partial endoreplication at the chromosome scale by the assembly and annotation of an accurate haplotype-phased genome of V. planifolia. Cytogenetic data demonstrated that the diploid genome size is 4.09 Gb, with 16 chromosome pairs, although aneuploid cells are frequently observed. Using PacBio HiFi and optical mapping, we assembled and phased a diploid genome of 3.4 Gb with a scaffold N50 of 1.2 Mb and 59 128 predicted protein-coding genes. The atypical k-mer frequencies and the uneven sequencing depth observed agreed with our expectation of unbalanced genome representation. Sixty-seven percent of the genes were scattered over only 30% of the genome, putatively linking gene-rich regions and the endoreplication phenomenon. By contrast, low-coverage regions (non-endoreplicated) were rich in repeated elements but also contained 33% of the annotated genes. Furthermore, this assembly showed distinct haplotype-specific sequencing depth variation patterns, suggesting complex molecular regulation of endoreplication along the chromosomes. This high-quality, anchored assembly represents 83% of the estimated V. planifolia genome. It provides a significant step toward the elucidation of this complex genome. To support post-genomics efforts, we developed the Vanilla Genome Hub, a user-friendly integrated web portal that enables centralized access to high-throughput genomic and other omics data and interoperable use of bioinformatics tools.
Collapse
|
2
|
RapGreen, an interactive software and web package to explore and analyze phylogenetic trees. NAR Genom Bioinform 2021; 3:lqab088. [PMID: 34568824 PMCID: PMC8459725 DOI: 10.1093/nargab/lqab088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 12/26/2022] Open
Abstract
RapGreen is a modular software package targeted at scientists handling large datasets for phylogenetic analysis. Its primary function is the graphical visualization and exploration of large trees. In addition, RapGreen offers a tree pattern search function to seek evolutionary scenarios among large collections of phylogenetic trees. Other functionalities include tree reconciliation with a given species tree: the detection of duplication or loss events during evolution and tree rooting. Last but not least, RapGreen features the ability to integrate heterogeneous data while visualizing and otherwise analyzing phylogenetic trees.
Collapse
|
3
|
Three founding ancestral genomes involved in the origin of sugarcane. ANNALS OF BOTANY 2021; 127:827-840. [PMID: 33637991 PMCID: PMC8103802 DOI: 10.1093/aob/mcab008] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 02/25/2021] [Indexed: 05/11/2023]
Abstract
BACKGROUND AND AIMS Modern sugarcane cultivars (Saccharum spp.) are high polyploids, aneuploids (2n = ~12x = ~120) derived from interspecific hybridizations between the domesticated sweet species Saccharum officinarum and the wild species S. spontaneum. METHODS To analyse the architecture and origin of such a complex genome, we analysed the sequences of all 12 hom(oe)ologous haplotypes (BAC clones) from two distinct genomic regions of a typical modern cultivar, as well as the corresponding sequence in Miscanthus sinense and Sorghum bicolor, and monitored their distribution among representatives of the Saccharum genus. KEY RESULTS The diversity observed among haplotypes suggested the existence of three founding genomes (A, B, C) in modern cultivars, which diverged between 0.8 and 1.3 Mya. Two genomes (A, B) were contributed by S. officinarum; these were also found in its wild presumed ancestor S. robustum, and one genome (C) was contributed by S. spontaneum. These results suggest that S. officinarum and S. robustum are derived from interspecific hybridization between two unknown ancestors (A and B genomes). The A genome contributed most haplotypes (nine or ten) while the B and C genomes contributed one or two haplotypes in the regions analysed of this typical modern cultivar. Interspecific hybridizations likely involved accessions or gametes with distinct ploidy levels and/or were followed by a series of backcrosses with the A genome. The three founding genomes were found in all S. barberi, S. sinense and modern cultivars analysed. None of the analysed accessions contained only the A genome or the B genome, suggesting that representatives of these founding genomes remain to be discovered. CONCLUSIONS This evolutionary model, which combines interspecificity and high polyploidy, can explain the variable chromosome pairing affinity observed in Saccharum. It represents a major revision of the understanding of Saccharum diversity.
Collapse
|
4
|
Coconut genome assembly enables evolutionary analysis of palms and highlights signaling pathways involved in salt tolerance. Commun Biol 2021; 4:105. [PMID: 33483627 PMCID: PMC7822834 DOI: 10.1038/s42003-020-01593-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 12/09/2020] [Indexed: 01/30/2023] Open
Abstract
Coconut (Cocos nucifera) is the emblematic palm of tropical coastal areas all around the globe. It provides vital resources to millions of farmers. In an effort to better understand its evolutionary history and to develop genomic tools for its improvement, a sequence draft was recently released. Here, we present a dense linkage map (8402 SNPs) aiming to assemble the large genome of coconut (2.42 Gbp, 2n = 32) into 16 pseudomolecules. As a result, 47% of the sequences (representing 77% of the genes) were assigned to 16 linkage groups and ordered. We observed segregation distortion in chromosome Cn15, which is a signature of strong selection among pollen grains, favouring the maternal allele. Comparing our results with the genome of the oil palm Elaeis guineensis allowed us to identify major events in the evolutionary history of palms. We find that coconut underwent a massive transposable element invasion in the last million years, which could be related to the fluctuations of sea level during the glaciations at Pleistocene that would have triggered a population bottleneck. Finally, to better understand the facultative halophyte trait of coconut, we conducted an RNA-seq experiment on leaves to identify key players of signaling pathways involved in salt stress response. Altogether, our findings represent a valuable resource for the coconut breeding community.
Collapse
|
5
|
Transcriptional Regulation of Sorghum Stem Composition: Key Players Identified Through Co-expression Gene Network and Comparative Genomics Analyses. FRONTIERS IN PLANT SCIENCE 2020; 11:224. [PMID: 32194601 PMCID: PMC7064007 DOI: 10.3389/fpls.2020.00224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 02/12/2020] [Indexed: 06/10/2023]
Abstract
Most sorghum biomass accumulates in stem secondary cell walls (SCW). As sorghum stems are used as raw materials for various purposes such as feed, energy and fiber reinforced polymers, identifying the genes responsible for SCW establishment is highly important. Taking advantage of studies performed in model species, most of the structural genes contributing at the molecular level to the SCW biosynthesis in sorghum have been proposed while their regulatory factors have mostly not been determined. Validation of the role of several MYB and NAC transcription factors in SCW regulation in Arabidopsis and a few other species has been provided. In this study, we contributed to the recent efforts made in grasses to uncover the mechanisms underlying SCW establishment. We reported updated phylogenies of NAC and MYB in 9 different species and exploited findings from other species to highlight candidate regulators of SCW in sorghum. We acquired expression data during sorghum internode development and used co-expression analyses to determine groups of co-expressed genes that are likely to be involved in SCW establishment. We were able to identify two groups of co-expressed genes presenting multiple evidences of involvement in SCW building. Gene enrichment analysis of MYB and NAC genes provided evidence that while NAC SECONDARY WALL THICKENING PROMOTING FACTOR NST genes and SECONDARY WALL-ASSOCIATED NAC DOMAIN PROTEIN gene functions appear to be conserved in sorghum, NAC master regulators of SCW in sorghum may not be as tissue compartmentalized as in Arabidopsis. We showed that for every homolog of the key SCW MYB in Arabidopsis, a similar role is expected for sorghum. In addition, we unveiled sorghum MYB and NAC that have not been identified to date as being involved in cell wall regulation. Although specific validation of the MYB and NAC genes uncovered in this study is needed, we provide a network of sorghum genes involved in SCW both at the structural and regulatory levels.
Collapse
|
6
|
Detection of significant SNP associated with production and oil quality traits in interspecific oil palm hybrids using RARSeq. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2020; 291:110366. [PMID: 31928673 DOI: 10.1016/j.plantsci.2019.110366] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 11/28/2019] [Accepted: 11/30/2019] [Indexed: 06/10/2023]
Abstract
A RARSeq based Association mapping study was performed in a population of 104 Elaeis oleifera x E. guineensis hybrids of five origins with the aim of finding functional markers associated to six productive and 19 oil quality traits. For this purpose mRNA of each genotype was isolated and double stranded cDNA was synthesized. Following digestion with two restriction enzymes and adapter ligation, a size selected pool of barcoded amplicons was produced and sequenced using Illumina MiSeq. The obtained sequences were processed with a "snakemake" pipeline, filtered and missing values were imputed. For all traits except two significant effects of the origin was observed. Genetic diversity analyses revealed high variability within origins and an excess of heterozygosity in the population. Two GLM models with Q matrix or PCA matrix as covariates and two MLM models incorporating in addition a Kinship matrix were tested for genotype-phenotype associations using GAPIT software. Using unadjusted p values (< 0.01) 78 potential associations were detected involving 25 SNP and 20 traits. When applying FDR multiple testing with p < 0.05, 25 significant associations remained involving eight SNP and six quality traits. Four SNP were located in genes with a potential relevant biological meaning.
Collapse
|
7
|
Association Mapping Between Candidate Gene SNP and Production and Oil Quality Traits in Interspecific Oil Palm Hybrids. PLANTS 2019; 8:plants8100377. [PMID: 31561627 PMCID: PMC6843369 DOI: 10.3390/plants8100377] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 09/13/2019] [Accepted: 09/14/2019] [Indexed: 01/07/2023]
Abstract
Oil palm production is gaining importance in Central and South America. However, the main species Elaeis guineensis (Eg) is suffering severely from bud rod disease, restricting the potential cultivation areas. Therefore, breeding companies have started to work with interspecific Elaeis oleifera × Eg (Eo × Eg) hybrids which are tolerant to this disease. We performed association studies between candidate gene (CG) single nucleotide polymorphisms (SNP) and six production and 19 oil quality traits in 198 accessions of interspecific oil palm hybrids from five different origins. For this purpose, barcoded amplicons of initially 167 CG were produced from each genotype and sequenced with Ion Torrent. After sequence cleaning 115 SNP remained targeting 62 CG. The influence of the origins on the different traits was analyzed and a genetic diversity study was performed. Two generalized linear models (GLM) with principle component analysis (PCA) or structure (Q) matrixes as covariates and two mixed linear models (MLM) which included in addition a Kinship (K) matrix were applied for association mapping using GAPIT. False discovery rate (FDR) multiple testing corrections were applied in order to avoid Type I errors. However, with FDR adjusted p values no significant associations between SNP and traits were detected. If using unadjusted p values below 0.05, seven of the studied CG showed potential associations with production traits, while 23 CG may influence different quality traits. Under these conditions the current approach and the detected candidate genes could be exploited for selecting genotypes with superior CG alleles in Marker Assisted Selection systems.
Collapse
|
8
|
Transcriptome data from three endemic Myrtaceae species from New Caledonia displaying contrasting responses to myrtle rust ( Austropuccinia psidii). Data Brief 2019; 22:794-811. [PMID: 30766900 PMCID: PMC6362868 DOI: 10.1016/j.dib.2018.12.080] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 12/14/2018] [Accepted: 12/24/2018] [Indexed: 12/30/2022] Open
Abstract
The myrtle rust disease, caused by the fungus Austropuccinia psidii, infects a wide range of host species within the Myrtaceae family worldwide. Since its first report in 2013 in New Caledonia, it was found on various types of native environments where Myrtaceae are the dominant or codominant species, as well as in several commercial nurseries. It is now considered as a significant threat to ecosystems biodiversity and Myrtaceae-related economy. The use of predictive molecular markers for resistance against myrtle rust is currently the most cost-effective and ecological approach to control the disease. Such an approach for neo Caledonian endemic Myrtaceae species was not possible because of the lack of genomic resources. The recent advancement in new generation sequencing technologies accompanied with relevant bioinformatics tools now provide new research opportunity for work in non-model organism at the transcriptomic level. The present study focuses on transcriptome analysis on three Myrtaceae species endemic to New Caledonia (Arillastrum gummiferum, Syzygium longifolium and Tristaniopsis glauca) that display contrasting responses to the pathogen (non-infected vs infected). Differential gene expression (DGE) and variant calling analysis were conducted on each species. We combined a dual approach by using 1) the annotated reference genome of a related Myrtaceae species (Eucalyptus grandis) and 2) a de novo transcriptomes of each species.
Collapse
|
9
|
Abstract
Coconut palm (Cocos nucifera,2n = 32), a member of genus Cocos and family Arecaceae (Palmaceae), is an important tropical fruit and oil crop. Currently, coconut palm is cultivated in 93 countries, including Central and South America, East and West Africa, Southeast Asia and the Pacific Islands, with a total growth area of more than 12 million hectares [1]. Coconut palm is generally classified into 2 main categories: “Tall” (flowering 8–10 years after planting) and “Dwarf” (flowering 4–6 years after planting), based on morphological characteristics and breeding habits. This Palmae species has a long growth period before reproductive years, which hinders conventional breeding progress. In spite of initial successes, improvements made by conventional breeding have been very slow. In the present study, we obtained de novo sequences of the Cocos nucifera genome: a major genomic resource that could be used to facilitate molecular breeding in Cocos nucifera and accelerate the breeding process in this important crop. A total of 419.67 gigabases (Gb) of raw reads were generated by the Illumina HiSeq 2000 platform using a series of paired-end and mate-pair libraries, covering the predicted Cocos nucifera genome length (2.42 Gb, variety “Hainan Tall”) to an estimated ×173.32 read depth. A total scaffold length of 2.20 Gb was generated (N50 = 418 Kb), representing 90.91% of the genome. The coconut genome was predicted to harbor 28 039 protein-coding genes, which is less than in Phoenix dactylifera (PDK30: 28 889), Phoenix dactylifera (DPV01: 41 660), and Elaeis guineensis (EG5: 34 802). BUSCO evaluation demonstrated that the obtained scaffold sequences covered 90.8% of the coconut genome and that the genome annotation was 74.1% complete. Genome annotation results revealed that 72.75% of the coconut genome consisted of transposable elements, of which long-terminal repeat retrotransposons elements (LTRs) accounted for the largest proportion (92.23%). Comparative analysis of the antiporter gene family and ion channel gene families between C. nucifera and Arabidopsis thaliana indicated that significant gene expansion may have occurred in the coconut involving Na+/H+ antiporter, carnitine/acylcarnitine translocase, potassium-dependent sodium-calcium exchanger, and potassium channel genes. Despite its agronomic importance, C. nucifera is still under-studied. In this report, we present a draft genome of C. nucifera and provide genomic information that will facilitate future functional genomics and molecular-assisted breeding in this crop species.
Collapse
|
10
|
Genomic preselection with genotyping-by-sequencing increases performance of commercial oil palm hybrid crosses. BMC Genomics 2017; 18:839. [PMID: 29096603 PMCID: PMC5667528 DOI: 10.1186/s12864-017-4179-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 10/05/2017] [Indexed: 01/14/2023] Open
Abstract
Background There is great potential for the genetic improvement of oil palm yield. Traditional progeny tests allow accurate selection but limit the number of individuals evaluated. Genomic selection (GS) could overcome this constraint. We estimated the accuracy of GS prediction of seven oil yield components using A × B hybrid progeny tests with almost 500 crosses for training and 200 crosses for independent validation. Genotyping-by-sequencing (GBS) yielded +5000 single nucleotide polymorphisms (SNPs) on the parents of the crosses. The genomic best linear unbiased prediction method gave genomic predictions using the SNPs of the training and validation sets and the phenotypes of the training crosses. The practical impact was illustrated by quantifying the additional bunch production of the crosses selected in the validation experiment if genomic preselection had been applied in the parental populations before progeny tests. Results We found that prediction accuracies for cross values plateaued at 500 to 2000 SNPs, with high (0.73) or low (0.28) values depending on traits. Similar results were obtained when parental breeding values were predicted. GS was able to capture genetic differences within parental families, requiring at least 2000 SNPs with less than 5% missing data, imputed using pedigrees. Genomic preselection could have increased the selected hybrids bunch production by more than 10%. Conclusions Finally, preselection for yield components using GBS is the first possible application of GS in oil palm. This will increase selection intensity, thus improving the performance of commercial hybrids. Further research is required to increase the benefits from GS, which should revolutionize oil palm breeding. Electronic supplementary material The online version of this article (10.1186/s12864-017-4179-3) contains supplementary material, which is available to authorized users.
Collapse
|
11
|
Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut. PLoS One 2017; 12:e0173300. [PMID: 28334050 PMCID: PMC5363918 DOI: 10.1371/journal.pone.0173300] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 02/17/2017] [Indexed: 01/20/2023] Open
Abstract
The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).
Collapse
|
12
|
Abstract
The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager.
Collapse
|
13
|
Abstract
Coffee is a valuable beverage crop due to its characteristic flavor, aroma, and the stimulating effects of caffeine. We generated a high-quality draft genome of the species Coffea canephora, which displays a conserved chromosomal gene order among asterid angiosperms. Although it shows no sign of the whole-genome triplication identified in Solanaceae species such as tomato, the genome includes several species-specific gene family expansions, among them N-methyltransferases (NMTs) involved in caffeine production, defense-related genes, and alkaloid and flavonoid enzymes involved in secondary compound synthesis. Comparative analyses of caffeine NMTs demonstrate that these genes expanded through sequential tandem duplications independently of genes from cacao and tea, suggesting that caffeine in eudicots is of polyphyletic origin.
Collapse
|
14
|
Expansion of banana (Musa acuminata) gene families involved in ethylene biosynthesis and signalling after lineage-specific whole-genome duplications. THE NEW PHYTOLOGIST 2014; 202:986-1000. [PMID: 24716518 DOI: 10.1111/nph.12710] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 12/25/2013] [Indexed: 05/26/2023]
Abstract
Whole-genome duplications (WGDs) are widespread in plants, and three lineage-specific WGDs occurred in the banana (Musa acuminata) genome. Here, we analysed the impact of WGDs on the evolution of banana gene families involved in ethylene biosynthesis and signalling, a key pathway for banana fruit ripening. Banana ethylene pathway genes were identified using comparative genomics approaches and their duplication modes and expression profiles were analysed. Seven out of 10 banana ethylene gene families evolved through WGD and four of them (1-aminocyclopropane-1-carboxylate synthase (ACS), ethylene-insensitive 3-like (EIL), ethylene-insensitive 3-binding F-box (EBF) and ethylene response factor (ERF)) were preferentially retained. Banana orthologues of AtEIN3 and AtEIL1, two major genes for ethylene signalling in Arabidopsis, were particularly expanded. This expansion was paralleled by that of EBF genes which are responsible for control of EIL protein levels. Gene expression profiles in banana fruits suggested functional redundancy for several MaEBF and MaEIL genes derived from WGD and subfunctionalization for some of them. We propose that EIL and EBF genes were co-retained after WGD in banana to maintain balanced control of EIL protein levels and thus avoid detrimental effects of constitutive ethylene signalling. In the course of evolution, subfunctionalization was favoured to promote finer control of ethylene signalling.
Collapse
|
15
|
Abstract
Banana is one of the world’s favorite fruits and one of the most important crops for developing countries. The banana reference genome sequence (Musa acuminata) was recently released. Given the taxonomic position of Musa, the completed genomic sequence has particular comparative value to provide fresh insights about the evolution of the monocotyledons. The study of the banana genome has been enhanced by a number of tools and resources that allows harnessing its sequence. First, we set up essential tools such as a Community Annotation System, phylogenomics resources and metabolic pathways. Then, to support post-genomic efforts, we improved banana existing systems (e.g. web front end, query builder), we integrated available Musa data into generic systems (e.g. markers and genetic maps, synteny blocks), we have made interoperable with the banana hub, other existing systems containing Musa data (e.g. transcriptomics, rice reference genome, workflow manager) and finally, we generated new results from sequence analyses (e.g. SNP and polymorphism analysis). Several uses cases illustrate how the Banana Genome Hub can be used to study gene families. Overall, with this collaborative effort, we discuss the importance of the interoperability toward data integration between existing information systems. Database URL: http://banana-genome.cirad.fr/
Collapse
|
16
|
The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 2012; 488:213-7. [PMID: 22801500 DOI: 10.1038/nature11241] [Citation(s) in RCA: 603] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 05/18/2012] [Indexed: 01/17/2023]
Abstract
Bananas (Musa spp.), including dessert and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister group to the well-studied Poales, which include cereals. Bananas are vital for food security in many tropical and subtropical countries and the most popular fruit in industrialized countries. The Musa domestication process started some 7,000 years ago in Southeast Asia. It involved hybridizations between diverse species and subspecies, fostered by human migrations, and selection of diploid and triploid seedless, parthenocarpic hybrids thereafter widely dispersed by vegetative propagation. Half of the current production relies on somaclones derived from a single triploid genotype (Cavendish). Pests and diseases have gradually become adapted, representing an imminent danger for global banana production. Here we describe the draft sequence of the 523-megabase genome of a Musa acuminata doubled-haploid genotype, providing a crucial stepping-stone for genetic improvement of banana. We detected three rounds of whole-genome duplications in the Musa lineage, independently of those previously described in the Poales lineage and the one we detected in the Arecales lineage. This first monocotyledon high-continuity whole-genome sequence reported outside Poales represents an essential bridge for comparative genome analysis in plants. As such, it clarifies commelinid-monocotyledon phylogenetic relationships, reveals Poaceae-specific features and has led to the discovery of conserved non-coding sequences predating monocotyledon-eudicotyledon divergence.
Collapse
|
17
|
Abstract
Summary: We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. Availability: The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form Contact:valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
18
|
High homologous gene conservation despite extreme autopolyploid redundancy in sugarcane. THE NEW PHYTOLOGIST 2011; 189:629-42. [PMID: 21039564 DOI: 10.1111/j.1469-8137.2010.03497.x] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Modern sugarcane (Saccharum spp.) is the leading sugar crop and a primary energy crop. It has the highest level of 'vertical' redundancy (2n=12x=120) of all polyploid plants studied to date. It was produced about a century ago through hybridization between two autopolyploid species, namely S. officinarum and S. spontaneum. In order to investigate the genome dynamics in this highly polyploid context, we sequenced and compared seven hom(oe)ologous haplotypes (bacterial artificial chromosome clones). Our analysis revealed a high level of gene retention and colinearity, as well as high gene structure and sequence conservation, with an average sequence divergence of 4% for exons. Remarkably, all of the hom(oe)ologous genes were predicted as being functional (except for one gene fragment) and showed signs of evolving under purifying selection, with the exception of genes within segmental duplications. By contrast, transposable elements displayed a general absence of colinearity among hom(oe)ologous haplotypes and appeared to have undergone dynamic expansion in Saccharum, compared with sorghum, its close relative in the Andropogonea tribe. These results reinforce the general trend emerging from recent studies indicating the diverse and nuanced effect of polyploidy on genome dynamics.
Collapse
|
19
|
Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana). BMC PLANT BIOLOGY 2010; 10:149. [PMID: 20637079 PMCID: PMC3017797 DOI: 10.1186/1471-2229-10-149] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 07/16/2010] [Indexed: 05/09/2023]
Abstract
BACKGROUND Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). RESULTS Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. CONCLUSIONS A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
Collapse
|
20
|
Abstract
Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at .
Collapse
|
21
|
Abstract
The annotation of newly sequenced bacterial genomes begins with running several automatic analysis methods, with major emphasis on the identification of protein-coding genes. DNA sequences are heterogeneous in local nucleotide composition and this leads sometimes to sequences being annotated as authentic genes when they are not protein-coding genes or are true but uncharacterized protein-coding genes. This first annotation step is generally followed by an expert manual annotation of the predicted genes. The genomic data (sequence and annotations) organized in an appropriate databank file format is subsequently submitted to an entry point of the International Nucleotide Sequence Database. These procedures are inevitably subject to mistakes, and this can lead to unintentional syntactic annotation errors being stored in public databanks. Here, we present a new web program, MICheck (MIcrobial genome Checker), that enables rapid verification of sets of annotated genes and frameshifts in previously published bacterial genomes. The web interface allows one easily to investigate the MICheck results, i.e. inaccurate or missed gene annotations: a graphical representation is drawn, in which the genomic context of a unique coding DNA sequence annotation or a predicted frameshift is given, using information on the coding potential (curves) and annotation of the neighbouring genes. We illustrate some capabilities of the MICheck site through the analysis of 20 bacterial genomes, 9 of which were selected for their ‘Reviewed’ status in the National Center for Biotechnology Information (NCBI) Reference Sequence Project (RefSeq). In the context of the numerous re-annotation projects for microbial genomes, this tool can be seen as a preliminary step before the functional re-annotation step to check quickly for missing or wrongly annotated genes. The MICheck website is accessible at the following address: .
Collapse
|
22
|
The genome sequence of the entomopathogenic bacterium Photorhabdus luminescens. Nat Biotechnol 2003; 21:1307-13. [PMID: 14528314 DOI: 10.1038/nbt886] [Citation(s) in RCA: 407] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2003] [Accepted: 08/18/2003] [Indexed: 11/09/2022]
Abstract
Photorhabdus luminescens is a symbiont of nematodes and a broad-spectrum insect pathogen. The complete genome sequence of strain TT01 is 5,688,987 base pairs (bp) long and contains 4,839 predicted protein-coding genes. Strikingly, it encodes a large number of adhesins, toxins, hemolysins, proteases and lipases, and contains a wide array of antibiotic synthesizing genes. These proteins are likely to play a role in the elimination of competitors, host colonization, invasion and bioconversion of the insect cadaver, making P. luminescens a promising model for the study of symbiosis and host-pathogen interactions. Comparison with the genomes of related bacteria reveals the acquisition of virulence factors by extensive horizontal transfer and provides clues about the evolution of an insect pathogen. Moreover, newly identified insecticidal proteins may be effective alternatives for the control of insect pests.
Collapse
|
23
|
Abstract
UNLABELLED AMIGene (Annotation of MIcrobial Genes) is an application for automatically identifying the most likely coding sequences (CDSs) in a large contig or a complete bacterial genome sequence. The first step in AMIGene is dedicated to the construction of Markov models that fit the input genomic data (i.e. the gene model), followed by the combination of well-known gene-finding methods and an heuristic approach for the selection of the most likely CDSs. The web interface allows the user to select one or several gene models applied to the analysis of the input sequence by the AMIGene program and to visualize the list of predicted CDSs graphically and in a downloadable text format. The AMIGene web site is accessible at the following address: http://www.genoscope.cns.fr/agc/tools/amigene/index.html ( CONTACT sbocs@genoscope.cns.fr).
Collapse
|
24
|
Abstract
Despite extensive annotation by two independent teams, the Helicobacter pylori genome appeared to lack a complete secretion machinery. The use of clinical isolates to substantiate in silico annotation is used here to identify the missing secE component of the major secretion machinery of Helicobacter pylori.
Collapse
|
25
|
|
26
|
Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. BMC Bioinformatics 2002; 3:5. [PMID: 11879526 PMCID: PMC77393 DOI: 10.1186/1471-2105-3-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2001] [Accepted: 02/05/2002] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Analysis of any newly sequenced bacterial genome starts with the identification of protein-coding genes. Despite the accumulation of multiple complete genome sequences, which provide useful comparisons with close relatives among other organisms during the annotation process, accurate gene prediction remains quite difficult. A major reason for this situation is that genes are tightly packed in prokaryotes, resulting in frequent overlap. Thus, detection of translation initiation sites and/or selection of the correct coding regions remain difficult unless appropriate biological knowledge (about the structure of a gene) is imbedded in the approach. RESULTS We have developed a new program that automatically identifies biologically significant candidate genes in a bacterial genome. Twenty-six complete prokaryotic genomes were analyzed using this tool, and the accuracy of gene finding was assessed by comparison with existing annotations. This analysis revealed that, despite the enormous effort of genome program annotators, a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong. Moreover, the analysis of several putative new genes shows that, as expected, many short genes have escaped annotation. In most cases, these new genes revealed frameshifts that could be either artifacts or genuine frameshifts. Some entirely unexpected new genes have also been identified. This allowed us to get a more complete picture of prokaryotic genomes. The results of this procedure are progressively integrated into the SWISS-PROT reference databank. CONCLUSIONS The results described in the present study show that our procedure is very satisfactory in terms of gene finding accuracy. Except in few cases, discrepancies between our results and annotations provided by individual authors can be accounted for by the nature of each annotation process or by specific characteristics of some genomes. This stresses that close cooperation between scientists, regular update and curation of the findings in databases are clearly required to reduce the level of errors in genome annotation (and hence in reducing the unfortunate spreading of errors through centralized data libraries).
Collapse
|