1
|
Mishra S, Srivastava AK, Khan AW, Tran LSP, Nguyen HT. The era of panomics-driven gene discovery in plants. Trends Plant Sci 2024:S1360-1385(24)00063-3. [PMID: 38658292 DOI: 10.1016/j.tplants.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 03/01/2024] [Accepted: 03/08/2024] [Indexed: 04/26/2024]
Abstract
Panomics is an approach to integrate multiple 'omics' datasets, generated using different individuals or natural variations. Considering their diverse phenotypic spectrum, the phenome is inherently associated with panomics-based science, which is further combined with genomics, transcriptomics, metabolomics, and other omics techniques, either independently or collectively. Panomics has been accelerated through recent technological advancements in the field of genomics that enable the detection of population-wide structural variations (SVs) and hence offer unprecedented insights into the genetic variations contributing to important agronomic traits. The present review provides the recent trends of panomics-driven gene discovery toward various traits related to plant development, stress tolerance, accumulation of specialized metabolites, and domestication/dedomestication. In addition, the success stories are highlighted in the broader context of enhancing crop productivity.
Collapse
Affiliation(s)
- Shefali Mishra
- Nuclear Agriculture and Biotechnology Division, Bhabha Atomic Research Centre, Mumbai, Maharashtra 400085, India
| | - Ashish Kumar Srivastava
- Nuclear Agriculture and Biotechnology Division, Bhabha Atomic Research Centre, Mumbai, Maharashtra 400085, India; Homi Bhabha National Institute, Mumbai 400094, India.
| | - Aamir W Khan
- Division of Plant Science and Technology and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA
| | - Lam-Son Phan Tran
- Institute of Genomics for Crop Abiotic Stress Tolerance, Department of Plant and Soil Science, Texas Tech University, Lubbock, TX, USA
| | - Henry T Nguyen
- Division of Plant Science and Technology and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA.
| |
Collapse
|
2
|
Lian S, Chen Y, Zhou Y, Feng T, Chen J, Liang L, Qian Y, Huang T, Zhang C, Wu F, Zou W, Li Z, Meng L, Li M. Functional differentiation and genetic diversity of rice cation exchanger (CAX) genes and their potential use in rice improvement. Sci Rep 2024; 14:8642. [PMID: 38622172 PMCID: PMC11018787 DOI: 10.1038/s41598-024-58224-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 03/26/2024] [Indexed: 04/17/2024] Open
Abstract
Cation exchanger (CAX) genes play an important role in plant growth/development and response to biotic and abiotic stresses. Here, we tried to obtain important information on the functionalities and phenotypic effects of CAX gene family by systematic analyses of their expression patterns, genetic diversity (gene CDS haplotypes, structural variations, gene presence/absence variations) in 3010 rice genomes and nine parents of 496 Huanghuazhan introgression lines, the frequency shifts of the predominant gcHaps at these loci to artificial selection during modern breeding, and their association with tolerances to several abiotic stresses. Significant amounts of variation also exist in the cis-regulatory elements (CREs) of the OsCAX gene promoters in 50 high-quality rice genomes. The functional differentiation of OsCAX gene family were reflected primarily by their tissue and development specific expression patterns and in varied responses to different treatments, by unique sets of CREs in their promoters and their associations with specific agronomic traits/abiotic stress tolerances. Our results indicated that OsCAX1a and OsCAX2 as general signal transporters were in many processes of rice growth/development and responses to diverse environments, but they might be of less value in rice improvement. OsCAX1b, OsCAX1c, OsCAX3 and OsCAX4 was expected to be of potential value in rice improvement because of their associations with specific traits, responsiveness to specific abiotic stresses or phytohormones, and relatively high gcHap and CRE diversity. Our strategy was demonstrated to be highly efficient to obtain important genetic information on genes/alleles of specific gene family and can be used to systematically characterize the other rice gene families.
Collapse
Affiliation(s)
- Shangshu Lian
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yanjun Chen
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Yanyan Zhou
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Ting Feng
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Jingsi Chen
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Lunping Liang
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Yingzhi Qian
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Tao Huang
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Chenyang Zhang
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Fengcai Wu
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
| | - Wenli Zou
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Zhikang Li
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Lijun Meng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
| | - Min Li
- School of Agronomy, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
3
|
Hu H, Li R, Zhao J, Batley J, Edwards D. Technological Development and Advances for Constructing and Analyzing Plant Pangenomes. Genome Biol Evol 2024; 16:evae081. [PMID: 38669452 PMCID: PMC11058698 DOI: 10.1093/gbe/evae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/09/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
A pangenome captures the genomic diversity for a species, derived from a collection of genetic sequences of diverse populations. Advances in sequencing technologies have given rise to three primary methods for pangenome construction and analysis: de novo assembly and comparison, reference genome-based iterative assembly, and graph-based pangenome construction. Each method presents advantages and challenges in processing varying amounts and structures of DNA sequencing data. With the emergence of high-quality genome assemblies and advanced bioinformatic tools, the graph-based pangenome is emerging as an advanced reference for exploring the biological and functional implications of genetic variations.
Collapse
Affiliation(s)
- Haifei Hu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Risheng Li
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
- College of Agriculture, South China Agricultural University, Guangzhou, Guangdong 510642, China
| | - Junliang Zhao
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering Laboratory, Guangzhou 510640, China
| | - Jacqueline Batley
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
4
|
Hu H, Scheben A, Wang J, Li F, Li C, Edwards D, Zhao J. Unravelling inversions: Technological advances, challenges, and potential impact on crop breeding. Plant Biotechnol J 2024; 22:544-554. [PMID: 37961986 PMCID: PMC10893937 DOI: 10.1111/pbi.14224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/11/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
Inversions, a type of chromosomal structural variation, significantly influence plant adaptation and gene functions by impacting gene expression and recombination rates. However, compared with other structural variations, their roles in functional biology and crop improvement remain largely unexplored. In this review, we highlight technological and methodological advancements that have allowed a comprehensive understanding of inversion variants through the pangenome framework and machine learning algorithms. Genome editing is an efficient method for inducing or reversing inversion mutations in plants, providing an effective mechanism to modify local recombination rates. Given the potential of inversions in crop breeding, we anticipate increasing attention on inversions from the scientific community in future research and breeding applications.
Collapse
Affiliation(s)
- Haifei Hu
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co‐construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering LaboratoryGuangzhouChina
| | - Armin Scheben
- Simons Center for Quantitative Biology, Cold Spring Harbor LaboratoryCold Spring HarborNew YorkUSA
| | - Jian Wang
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co‐construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering LaboratoryGuangzhouChina
| | - Fangping Li
- Guangdong Provincial Key Laboratory of Plant Molecular Breeding, State Key Laboratory for Conservation and Utilization of Subtropical Agro‐BioresourcesSouth China Agricultural UniversityGuangzhouChina
| | - Chengdao Li
- Western Crop Genetics Alliance, Centre for Crop & Food Innovation, Food Futures Institute, College of Science, Health, Engineering and EducationMurdoch UniversityMurdochWestern AustraliaAustralia
| | - David Edwards
- School of Biological SciencesUniversity of Western AustraliaPerthWestern AustraliaAustralia
- Australia & Centre for Applied BioinformaticsUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - Junliang Zhao
- Rice Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Genetics and Breeding of High Quality Rice in Southern China (Co‐construction by Ministry and Province), Ministry of Agriculture and Rural Affairs & Guangdong Key Laboratory of New Technology in Rice Breeding & Guangdong Rice Engineering LaboratoryGuangzhouChina
| |
Collapse
|
5
|
Zheng Z, Zhu M, Zhang J, Liu X, Hou L, Liu W, Yuan S, Luo C, Yao X, Liu J, Yang Y. A sequence-aware merger of genomic structural variations at population scale. Nat Commun 2024; 15:960. [PMID: 38307885 PMCID: PMC10837428 DOI: 10.1038/s41467-024-45244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 01/18/2024] [Indexed: 02/04/2024] Open
Abstract
Merging structural variations (SVs) at the population level presents a significant challenge, yet it is essential for conducting comprehensive genotypic analyses, especially in the era of pangenomics. Here, we introduce PanPop, a tool that utilizes an advanced sequence-aware SV merging algorithm to efficiently merge SVs of various types. We demonstrate that PanPop can merge and optimize the majority of multiallelic SVs into informative biallelic variants. We show its superior precision and lower rates of missing data compared to alternative software solutions. Our approach not only enables the filtering of SVs by leveraging multiple SV callers for enhanced accuracy but also facilitates the accurate merging of large-scale population SVs. These capabilities of PanPop will help to accelerate future SV-related studies.
Collapse
Affiliation(s)
- Zeyu Zheng
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Mingjia Zhu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jin Zhang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinfeng Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Liqiang Hou
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Wenyu Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Shuai Yuan
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Changhong Luo
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xinhao Yao
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China
| | - Jianquan Liu
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| | - Yongzhi Yang
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou, China.
| |
Collapse
|
6
|
Zhou Y, Kathiresan N, Yu Z, Rivera LF, Yang Y, Thimma M, Manickam K, Chebotarov D, Mauleon R, Chougule K, Wei S, Gao T, Green CD, Zuccolo A, Xie W, Ware D, Zhang J, McNally KL, Wing RA. A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset. BMC Biol 2024; 22:13. [PMID: 38273258 PMCID: PMC10809545 DOI: 10.1186/s12915-024-01820-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/09/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable. RESULTS Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a "subpopulation aware" 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq). CONCLUSIONS This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment.
Collapse
Affiliation(s)
- Yong Zhou
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA
| | - Nagarajan Kathiresan
- KAUST Supercomputing Laboratory (KSL), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Zhichao Yu
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Luis F Rivera
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Yujian Yang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Manjula Thimma
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Keerthana Manickam
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Dmytro Chebotarov
- International Rice Research Institute (IRRI), Los Baños, Laguna, 4031, Philippines
| | - Ramil Mauleon
- International Rice Research Institute (IRRI), Los Baños, Laguna, 4031, Philippines
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Tingting Gao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Carl D Green
- Information Technology Department, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
| | - Andrea Zuccolo
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia
- Crop Science Research Center (CSRC), Scuola Superiore Sant'Anna, Pisa, 56127, Italy
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- USDA ARS NEA Plant, Soil & Nutrition Laboratory Research Unit, Ithaca, NY, 14853, USA
| | - Jianwei Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China
| | - Kenneth L McNally
- International Rice Research Institute (IRRI), Los Baños, Laguna, 4031, Philippines
| | - Rod A Wing
- Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
- Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA.
- International Rice Research Institute (IRRI), Los Baños, Laguna, 4031, Philippines.
| |
Collapse
|
7
|
Ramsbottom KA, Prakash A, Riverol YP, Camacho OM, Sun Z, Kundu DJ, Bowler-Barnett E, Martin M, Fan J, Chebotarov D, McNally KL, Deutsch EW, Vizcaíno JA, Jones AR. A meta-analysis of rice phosphoproteomics data to understand variation in cell signalling across the rice pan-genome. bioRxiv 2023:2023.11.17.567512. [PMID: 38014076 PMCID: PMC10680829 DOI: 10.1101/2023.11.17.567512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Phosphorylation is the most studied post-translational modification, and has multiple biological functions. In this study, we have re-analysed publicly available mass spectrometry proteomics datasets enriched for phosphopeptides from Asian rice (Oryza sativa). In total we identified 15,522 phosphosites on serine, threonine and tyrosine residues on rice proteins. We identified sequence motifs for phosphosites, and link motifs to enrichment of different biological processes, indicating different downstream regulation likely caused by different kinase groups. We cross-referenced phosphosites against the rice 3,000 genomes, to identify single amino acid variations (SAAVs) within or proximal to phosphosites that could cause loss of a site in a given rice variety. The data was clustered to identify groups of sites with similar patterns across rice family groups, for example those highly conserved in Japonica, but mostly absent in Aus type rice varieties - known to have different responses to drought. These resources can assist rice researchers to discover alleles with significantly different functional effects across rice varieties. The data has been loaded into UniProt Knowledge-Base - enabling researchers to visualise sites alongside other data on rice proteins e.g. structural models from AlphaFold2, PeptideAtlas and the PRIDE database - enabling visualisation of source evidence, including scores and supporting mass spectra.
Collapse
Affiliation(s)
- Kerry A Ramsbottom
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7BE, United Kingdom
| | - Ananth Prakash
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Yasset Perez Riverol
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Oscar Martin Camacho
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7BE, United Kingdom
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Deepti J. Kundu
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Emily Bowler-Barnett
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Jun Fan
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Dmytro Chebotarov
- International Rice Research Institute, DAPO 7777, Manila 1301, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, DAPO 7777, Manila 1301, Philippines
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 7BE, United Kingdom
| |
Collapse
|
8
|
Contreras-Moreira B, Saraf S, Naamati G, Casas AM, Amberkar SS, Flicek P, Jones AR, Dyer S. GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation. Genome Biol 2023; 24:223. [PMID: 37798615 PMCID: PMC10552430 DOI: 10.1186/s13059-023-03071-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/21/2023] [Indexed: 10/07/2023] Open
Abstract
Crop pangenomes made from individual cultivar assemblies promise easy access to conserved genes, but genome content variability and inconsistent identifiers hamper their exploration. To address this, we define pangenes, which summarize a species coding potential and link back to original annotations. The protocol get_pangenes performs whole genome alignments (WGA) to call syntenic gene models based on coordinate overlaps. A benchmark with small and large plant genomes shows that pangenes recapitulate phylogeny-based orthologies and produce complete soft-core gene sets. Moreover, WGAs support lift-over and help confirm gene presence-absence variation. Source code and documentation: https://github.com/Ensembl/plant-scripts .
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
- Estación Experimental Aula Dei-CSIC, 50059, Zaragoza, Spain.
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Ana M Casas
- Estación Experimental Aula Dei-CSIC, 50059, Zaragoza, Spain
| | - Sandeep S Amberkar
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
| |
Collapse
|
9
|
Abstract
We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.
Collapse
Affiliation(s)
- Yuriy L. Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, 119991 Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, 117198 Moscow, Russia
| | - Nina G. Orlova
- Department of Mathematics, Financial University under the Government of the Russian Federation, Moscow, 125167 Russia
| |
Collapse
|
10
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
11
|
Yu Z, Chen Y, Zhou Y, Zhang Y, Li M, Ouyang Y, Chebotarov D, Mauleon R, Zhao H, Xie W, McNally KL, Wing RA, Guo W, Zhang J. Rice Gene Index: A comprehensive pan-genome database for comparative and functional genomics of Asian rice. Mol Plant 2023; 16:798-801. [PMID: 36966359 DOI: 10.1016/j.molp.2023.03.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 02/07/2023] [Accepted: 03/21/2023] [Indexed: 05/04/2023]
Affiliation(s)
- Zhichao Yu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Yongming Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yong Zhou
- Center for Desert Agriculture, Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Yulu Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Mengyuan Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Yidan Ouyang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Dmytro Chebotarov
- International Rice Research Institute (IRRI), Los Baños, Laguna 4031, Philippines
| | - Ramil Mauleon
- International Rice Research Institute (IRRI), Los Baños, Laguna 4031, Philippines
| | - Hu Zhao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Kenneth L McNally
- International Rice Research Institute (IRRI), Los Baños, Laguna 4031, Philippines
| | - Rod A Wing
- Center for Desert Agriculture, Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA.
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| | - Jianwei Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|