1
|
Zhu T, Chu Y, Niu G, Pan R, Chen M, Cheng Y, Zhang Y, Li Z, Jiang S, Hao L, Zou D, Xu T, Zhang Z. Editome Disease Knowledgebase v2.0: an updated resource of editome-disease associations through literature curation and integrative analysis. BIOINFORMATICS ADVANCES 2025; 5:vbaf012. [PMID: 39968378 PMCID: PMC11835235 DOI: 10.1093/bioadv/vbaf012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 01/09/2025] [Accepted: 01/23/2025] [Indexed: 02/20/2025]
Abstract
Motivation Editome Disease Knowledgebase (EDK) is a curated resource of knowledge between RNA editome and human diseases. Since its first release in 2018, a number of studies have discovered previously uncharacterized editome-disease associations and generated an abundance of RNA editing datasets. Thus, it is desirable to make significant updates for EDK by incorporating more editome-disease associations as well as their related editing profiles. Results Here, we present EDK v2.0, an updated version of editome-disease associations based on both literature curation and integrative analysis. EDK v2.0 incorporates a curated collection of 1097 editome-disease associations involving 115 diseases from 321 publications. Meanwhile, based on a standardized pipeline, EDK v2.0 provides RNA editing profiles from 48 datasets covering 2536 samples across 55 diseases. Through differential analysis on RNA editing, it further identifies a total of 7190 differential edited genes and 86 242 differential editing sites (DESs), leading to 266 339 DES-disease associations. Moreover, a curated list of 28 160 cis-RNA editing QTL associations, 458 187 DES-RNA binding protein associations, and 21 DES-RNA secondary structure associations are annotated and added to EDK v2.0. Additionally, it is equipped with a series of user-friendly tools to facilitate RNA editing online analysis. Availability and implementation https://ngdc.cncb.ac.cn/edk/.
Collapse
Affiliation(s)
- Tongtong Zhu
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuan Chu
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangyi Niu
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rong Pan
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ming Chen
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuanyuan Cheng
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuansheng Zhang
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhao Li
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shuai Jiang
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lili Hao
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dong Zou
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tianyi Xu
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhang Zhang
- National Genomics Data Center, China National Center for Bioinformation, Beijing 100101, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Kandror EK, Wang A, Carriere M, Peterson A, Liao W, Tjärnberg A, Fung JH, Mahbubani KT, Loper J, Pangburn W, Xu Y, Saeb-Parsy K, Rabadan R, Maniatis T, Rizvi AH. Enhancer Dynamics and Spatial Organization Drive Anatomically Restricted Cellular States in the Human Spinal Cord. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.10.632483. [PMID: 39829819 PMCID: PMC11741326 DOI: 10.1101/2025.01.10.632483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Here, we report the spatial organization of RNA transcription and associated enhancer dynamics in the human spinal cord at single-cell and single-molecule resolution. We expand traditional multiomic measurements to reveal epigenetically poised and bivalent active transcriptional enhancer states that define cell type specification. Simultaneous detection of chromatin accessibility and histone modifications in spinal cord nuclei reveals previously unobserved cell-type specific cryptic enhancer activity, in which transcriptional activation is uncoupled from chromatin accessibility. Such cryptic enhancers define both stable cell type identity and transitions between cells undergoing differentiation. We also define glial cell gene regulatory networks that reorganize along the rostrocaudal axis, revealing anatomical differences in gene regulation. Finally, we identify the spatial organization of cells into distinct cellular organizations and address the functional significance of this observation in the context of paracrine signaling. We conclude that cellular diversity is best captured through the lens of enhancer state and intercellular interactions that drive transitions in cellular state. This study provides fundamental insights into the cellular organization of the healthy human spinal cord.
Collapse
Affiliation(s)
- Elena K. Kandror
- Department of Neuroscience and Waisman Center, University of Wisconsin-Madison
| | - Anqi Wang
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Medical Center
| | | | - Alexis Peterson
- Department of Neuroscience and Waisman Center, University of Wisconsin-Madison
| | | | - Andreas Tjärnberg
- Department of Neuroscience and Waisman Center, University of Wisconsin-Madison
| | - Jun Hou Fung
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Medical Center
| | - Krishnaa T. Mahbubani
- Cambridge Biorepository for Translational Medicine, Cambridge NIHR Biomedical Research Centre, Cambridge, UK
- Department of Surgery, University of Cambridge, Cambridge, UK
- Department of Haematology, Cambridge Stem Cell Institute, Cambridge, UK
| | - Jackson Loper
- Department of Statistics, University of Michigan Ann Arbor
| | - William Pangburn
- Zuckerman Mind Brain Behavior Institute and Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center
| | - Yuchen Xu
- Department of Neuroscience and Waisman Center, University of Wisconsin-Madison
| | - Kourosh Saeb-Parsy
- Cambridge Biorepository for Translational Medicine, Cambridge NIHR Biomedical Research Centre, Cambridge, UK
- Department of Surgery, University of Cambridge, Cambridge, UK
| | - Raul Rabadan
- Program for Mathematical Genomics, Department of Systems Biology, Columbia University Medical Center
| | - Tom Maniatis
- New York Genome Center
- Zuckerman Mind Brain Behavior Institute and Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center
| | - Abbas H. Rizvi
- Department of Neuroscience and Waisman Center, University of Wisconsin-Madison
- Lead contact
| |
Collapse
|
3
|
Salvador-Martínez I, Murga-Moreno J, Nieto JC, Alsinet C, Enard D, Heyn H. Adaptation in human immune cells residing in tissues at the frontline of infections. Nat Commun 2024; 15:10329. [PMID: 39609395 PMCID: PMC11605006 DOI: 10.1038/s41467-024-54603-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 11/14/2024] [Indexed: 11/30/2024] Open
Abstract
Human immune cells are under constant evolutionary pressure, primarily through their role as first line of defence against pathogens. Most studies on immune adaptation are, however, based on protein-coding genes without considering their cellular context. Here, using data from the Human Cell Atlas, we infer the gene adaptation rate of the human immune landscape at cellular resolution. We find abundant cell types, like progenitor cells during development and adult cells in barrier tissues, to harbour significantly increased adaptation rates. We confirm the adaptation of tissue-resident T and NK cells in the adult lung located in compartments directly facing external challenges, such as respiratory pathogens. Analysing human iPSC-derived macrophages responding to various challenges, we find adaptation in early immune responses. Together, our study suggests host benefits to control pathogen spread at early stages of infection, providing a retrospect of forces that shaped the complexity, architecture, and function of the human body.
Collapse
Affiliation(s)
| | - Jesus Murga-Moreno
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Juan C Nieto
- CNAG, Centro Nacional de Análisis Genómico, Barcelona, Spain
| | - Clara Alsinet
- CNAG, Centro Nacional de Análisis Genómico, Barcelona, Spain
| | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.
| | - Holger Heyn
- CNAG, Centro Nacional de Análisis Genómico, Barcelona, Spain.
- Universitat de Barcelona (UB), Barcelona, Spain.
- ICREA, Barcelona, Spain.
| |
Collapse
|
4
|
Matsumoto Y, Yik-Lok Chung C, Isobe S, Sakamoto M, Lin X, Chan TF, Hirakawa H, Ishihara G, Lam HM, Nakayama S, Sasamoto S, Tanizawa Y, Watanabe A, Watanabe K, Yagura M, Niimura Y, Nakamura Y. Chromosome-scale assembly with improved annotation provides insights into breed-wide genomic structure and diversity in domestic cats. J Adv Res 2024:S2090-1232(24)00478-8. [PMID: 39490737 DOI: 10.1016/j.jare.2024.10.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 10/16/2024] [Accepted: 10/18/2024] [Indexed: 11/05/2024] Open
Abstract
INTRODUCTION Comprehensive genomic resources offer insights into biological features, including traits/disease-related genetic loci. The current reference genome assembly for the domestic cat (Felis catus), Felis_Catus_9.0 (felCat9), derived from sequences of the Abyssinian cat, may inadequately represent the general cat population, limiting the extent of deducible genetic variations. OBJECTIVES The goal was to develop Anicom American Shorthair 1.0 (AnAms1.0), a reference-grade chromosome-scale cat genome assembly. METHODS In contrast to prior assemblies relying on Abyssinian cat sequences, AnAms1.0 was constructed from the sequences of more popular American Shorthair breed, which is related to more breeds than the Abyssinian cat. By combining advanced genomics technologies, including PacBio long-read sequencing and Hi-C- and optical mapping data-based sequence scaffolding, we compared AnAms1.0 to existing Felidae genome assemblies (20 scaffolds, scaffolds N50 > 150 Mbp). Homology-based and ab initio gene annotation through Iso-Seq and RNA-Seq was used to identify new coding genes and splice variants. RESULTS AnAms1.0 demonstrated superior contiguity and accuracy than existing Felidae genome assemblies. Using AnAms1.0, we identified over 1.5 thousand structural variants and 29 million repetitions compared to felCat9. Additionally, we identified > 1,600 novel protein-coding genes. Notably, olfactory receptor structural variants and cardiomyopathy-related variants were identified. CONCLUSION AnAms1.0 facilitates the discovery of novel genes related to normal and disease phenotypes in domestic cats. The analyzed data are publicly accessible on Cats-I (https://cat.annotation.jp/), which we established as a platform for accumulating and sharing genomic resources to discover novel genetic traits and advance veterinary medicine.
Collapse
Affiliation(s)
- Yuki Matsumoto
- Research and Development Section, Anicom Specialty Medical Institute Inc., Yokohama, Kanagawa, Japan; Data Science Center, Azabu University, Sagamihara, Kanagawa, Japan.
| | - Claire Yik-Lok Chung
- School of Life Sciences and the Center for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | - Sachiko Isobe
- Kazusa DNA Research Institute, Kisarazu, Chiba, Japan
| | - Mika Sakamoto
- National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka, Japan
| | - Xiao Lin
- School of Life Sciences and the Center for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | - Ting-Fung Chan
- School of Life Sciences and the Center for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | | | - Genki Ishihara
- Research and Development Section, Anicom Specialty Medical Institute Inc., Yokohama, Kanagawa, Japan
| | - Hon-Ming Lam
- School of Life Sciences and the Center for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong Special Administrative Region
| | | | | | - Yasuhiro Tanizawa
- National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka, Japan
| | | | - Kei Watanabe
- Research and Development Section, Anicom Specialty Medical Institute Inc., Yokohama, Kanagawa, Japan
| | - Masaru Yagura
- National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka, Japan
| | - Yoshihito Niimura
- Department of Veterinary Sciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, Miyazaki, Japan
| | - Yasukazu Nakamura
- National Institute of Genetics, Research Organization of Information and Systems, Mishima, Shizuoka, Japan.
| |
Collapse
|
5
|
Sario S, Marques JP, Farelo L, Afonso S, Santos C, Melo-Ferreira J. Dissecting the invasion history of Spotted-Wing Drosophila (Drosophila suzukii) in Portugal using genomic data. BMC Genomics 2024; 25:813. [PMID: 39210249 PMCID: PMC11360492 DOI: 10.1186/s12864-024-10739-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND The invasive pest Spotted-Wing Drosophila, Drosophila suzukii (Matsumura), causes extensive damage and production losses of soft-skinned fruits. Native to Asia, the species has now spread worldwide, with first reports in Portugal in 2012. In this study, we focus on the genomic signatures of the recent Portuguese invasion, in the context of worldwide patterns established in previous works. We analyzed whole genome pool sequencing data from three Portuguese populations (N = 240) sampled in 2019 and 2021. RESULTS The correlation of allele frequencies suggested that Portuguese populations are related to South European ones, indicating a Mediterranean invasion route. While two populations exhibited levels of genetic variation comparable to others in the invasive range, a third showed low levels of genetic diversity, which may result from a recent colonization of the region. Genome-wide analyses of natural selection identified ten genes previously associated with D. suzukii's invasive capacity, which may have contributed to the species' success in Portugal. Additionally, we pinpointed six genes evolving under positive selection across Portuguese populations but not in European ones, which is indicative of local adaptation. One of these genes, nAChRalpha7, encodes a nicotinic acetylcholine receptor, which are known targets for insecticides widely used for D. suzukii control, such as neonicotinoids and spinosyns. Although spinosyn resistance has been associated with mutations in the nAChRalpha6 in other Drosophila species, the putative role of nAChRalpha7 in insecticide resistance and local adaptation in Portuguese D. suzukii populations encourages future investigation. CONCLUSIONS Our results highlight the complex nature of rapid species invasions and the role of rapid local adaptation in determining the invasive capacity of these species.
Collapse
Affiliation(s)
- Sara Sario
- Biology Department, Faculty of Sciences, University of Porto, Porto, 4169-007, Portugal.
- LAQV-REQUIMTE, Faculty of Sciences, University of Porto, Porto, 4050-453, Portugal.
| | - João P Marques
- Centro de Investigação em Biodiversidade e Recursos Genéticos, CIBIO, InBIO Laboratório Associado, Universidade do Porto, Campus de Vairão, Vairão, 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, 4485-661, Portugal
| | - Liliana Farelo
- Centro de Investigação em Biodiversidade e Recursos Genéticos, CIBIO, InBIO Laboratório Associado, Universidade do Porto, Campus de Vairão, Vairão, 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, 4485-661, Portugal
| | - Sandra Afonso
- Centro de Investigação em Biodiversidade e Recursos Genéticos, CIBIO, InBIO Laboratório Associado, Universidade do Porto, Campus de Vairão, Vairão, 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, 4485-661, Portugal
| | - Conceição Santos
- Biology Department, Faculty of Sciences, University of Porto, Porto, 4169-007, Portugal
- LAQV-REQUIMTE, Faculty of Sciences, University of Porto, Porto, 4050-453, Portugal
| | - José Melo-Ferreira
- Biology Department, Faculty of Sciences, University of Porto, Porto, 4169-007, Portugal
- Centro de Investigação em Biodiversidade e Recursos Genéticos, CIBIO, InBIO Laboratório Associado, Universidade do Porto, Campus de Vairão, Vairão, 4485-661, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, 4485-661, Portugal
| |
Collapse
|
6
|
Elfman J, Goins L, Heller T, Singh S, Wang YH, Li H. Discovery of a polymorphic gene fusion via bottom-up chimeric RNA prediction. Nucleic Acids Res 2024; 52:4409-4421. [PMID: 38587197 PMCID: PMC11077074 DOI: 10.1093/nar/gkae258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/27/2024] [Indexed: 04/09/2024] Open
Abstract
Gene fusions and their chimeric products are commonly linked with cancer. However, recent studies have found chimeric transcripts in non-cancer tissues and cell lines. Large-scale efforts to annotate structural variations have identified gene fusions capable of generating chimeric transcripts even in normal tissues. In this study, we present a bottom-up approach targeting population-specific chimeric RNAs, identifying 58 such instances in the GTEx cohort, including notable cases such as SUZ12P1-CRLF3, TFG-ADGRG7 and TRPM4-PPFIA3, which possess distinct patterns across different ancestry groups. We provide direct evidence for an additional 29 polymorphic chimeric RNAs with associated structural variants, revealing 13 novel rare structural variants. Additionally, we utilize the All of Us dataset and a large cohort of clinical samples to characterize the association of the SUZ12P1-CRLF3-causing variant with patient phenotypes. Our study showcases SUZ12P1-CRLF3 as a representative example, illustrating the identification of elusive structural variants by focusing on those producing population-specific fusion transcripts.
Collapse
Affiliation(s)
- Justin Elfman
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Lynette Goins
- Department of Biological Sciences, Clemson University, Clemson, SC 29631, USA
| | - Tessa Heller
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Sandeep Singh
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Computational Toxicology Facility, CSIR-Indian Institute of Toxicology Research, Lucknow, 226001, Uttar Pradesh, India
| | - Yuh-Hwa Wang
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
| | - Hui Li
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA 22903, USA
- Department of Pathology, University of Virginia, Charlottesville, VA 22903, USA
| |
Collapse
|
7
|
Lataretu M, Drechsel O, Kmiecinski R, Trappe K, Hölzer M, Fuchs S. Lessons learned: overcoming common challenges in reconstructing the SARS-CoV-2 genome from short-read sequencing data via CoVpipe2. F1000Res 2024; 12:1091. [PMID: 38716230 PMCID: PMC11074694 DOI: 10.12688/f1000research.136683.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/11/2024] [Indexed: 05/12/2024] Open
Abstract
Background Accurate genome sequences form the basis for genomic surveillance programs, the added value of which was impressively demonstrated during the COVID-19 pandemic by tracing transmission chains, discovering new viral lineages and mutations, and assessing them for infectiousness and resistance to available treatments. Amplicon strategies employing Illumina sequencing have become widely established for variant detection and reference-based reconstruction of SARS-CoV-2 genomes, and are routine bioinformatics tasks. Yet, specific challenges arise when analyzing amplicon data, for example, when crucial and even lineage-determining mutations occur near primer sites. Methods We present CoVpipe2, a bioinformatics workflow developed at the Public Health Institute of Germany to reconstruct SARS-CoV-2 genomes based on short-read sequencing data accurately. The decisive factor here is the reliable, accurate, and rapid reconstruction of genomes, considering the specifics of the used sequencing protocol. Besides fundamental tasks like quality control, mapping, variant calling, and consensus generation, we also implemented additional features to ease the detection of mixed samples and recombinants. Results We highlight common pitfalls in primer clipping, detecting heterozygote variants, and dealing with low-coverage regions and deletions. We introduce CoVpipe2 to address the above challenges and have compared and successfully validated the pipeline against selected publicly available benchmark datasets. CoVpipe2 features high usability, reproducibility, and a modular design that specifically addresses the characteristics of short-read amplicon protocols but can also be used for whole-genome short-read sequencing data. Conclusions CoVpipe2 has seen multiple improvement cycles and is continuously maintained alongside frequently updated primer schemes and new developments in the scientific community. Our pipeline is easy to set up and use and can serve as a blueprint for other pathogens in the future due to its flexibility and modularity, providing a long-term perspective for continuous support. CoVpipe2 is written in Nextflow and is freely accessible from \href{https://github.com/rki-mf1/CoVpipe2}{github.com/rki-mf1/CoVpipe2} under the GPL3 license.
Collapse
Affiliation(s)
- Marie Lataretu
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, 13353, Germany
| | - Oliver Drechsel
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, 13353, Germany
| | - René Kmiecinski
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, 13353, Germany
| | - Kathrin Trappe
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, 13353, Germany
| | - Martin Hölzer
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, 13353, Germany
| | - Stephan Fuchs
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, 13353, Germany
| |
Collapse
|
8
|
Kim KD, Shim J, Hwang JH, Kim D, El Baidouri M, Park S, Song J, Yu Y, Lee K, Ahn BO, Hong SY, Chin JH. Chromosome-level genome assembly of milk thistle (Silybum marianum (L.) Gaertn.). Sci Data 2024; 11:342. [PMID: 38580686 PMCID: PMC10997770 DOI: 10.1038/s41597-024-03178-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 03/22/2024] [Indexed: 04/07/2024] Open
Abstract
Silybum marianum (L.) Gaertn., commonly known as milk thistle, is a medicinal plant belonging to the Asteraceae family. This plant has been recognized for its medicinal properties for over 2,000 years. However, the genome of this plant remains largely undiscovered, having no reference genome at a chromosomal level. Here, we assembled the chromosome-level genome of S. marianum, allowing for the annotation of 53,552 genes and the identification of transposable elements comprising 58% of the genome. The genome assembly from this study showed 99.1% completeness as determined by BUSCO assessment, while the previous assembly (ASM154182v1) showed 36.7%. Functional annotation of the predicted genes showed 50,329 genes (94% of total genes) with known protein functions in public databases. Comparative genome analysis among Asteraceae plants revealed a striking conservation of collinearity between S. marianum and C. cardunculus. The genomic information generated from this study will be a valuable resource for milk thistle breeding and for use by the larger research community.
Collapse
Affiliation(s)
- Kyung Do Kim
- Department of Biosciences and Bioinformatics, Myongji University, Yongin, 17058, Korea.
| | | | - Ji-Hun Hwang
- Department of Biosciences and Bioinformatics, Myongji University, Yongin, 17058, Korea
| | - Daegwan Kim
- Department of Research and Development, DNACARE Co. Ltd., Seoul, 06126, Korea
| | - Moaine El Baidouri
- Laboratoire Génome et Développement des Plantes, Center National de la Recherche Scientifique (CNRS), Perpignan, France
- Laboratoire Génome et Développement des Plantes, University of Perpignan Via Domitia, Perpignan, France
| | - Soyeon Park
- Department of Biosciences and Bioinformatics, Myongji University, Yongin, 17058, Korea
| | - Jiyong Song
- Department of Biosciences and Bioinformatics, Myongji University, Yongin, 17058, Korea
- Department of Research and Development, DNACARE Co. Ltd., Seoul, 06126, Korea
| | - Yeisoo Yu
- Department of Research and Development, DNACARE Co. Ltd., Seoul, 06126, Korea
| | - Keunpyo Lee
- International Technology Cooperation Center, Technology Cooperation Bureau, Rural Development Administration, Jeonju, 54875, Korea
| | - Byoung-Ohg Ahn
- Genomics Division, Department of Agricultural Biotechnology, National Institute of Agricultural Science, Rural Development Administration, Jeonju, 54874, Korea
| | - Su Young Hong
- Genomics Division, Department of Agricultural Biotechnology, National Institute of Agricultural Science, Rural Development Administration, Jeonju, 54874, Korea.
| | - Joong Hyoun Chin
- Food Crops Molecular Breeding Laboratory, Department of Integrative Biological Sciences and Industry, Sejong University, Seoul, 05006, Korea.
- Convergence Research Center for Natural Products, Sejong University, Seoul, 05006, Korea.
| |
Collapse
|
9
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. G3 (BETHESDA, MD.) 2024; 14:jkae031. [PMID: 38365205 PMCID: PMC11090462 DOI: 10.1093/g3journal/jkae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 10/10/2023] [Accepted: 01/29/2024] [Indexed: 02/18/2024]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in nonmodel species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to nonmodel genomes. We apply ABC-MK to the human proteome and a set of known virus interacting proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85719, USA
| |
Collapse
|
10
|
Healey AL, Garsmeur O, Lovell JT, Shengquiang S, Sreedasyam A, Jenkins J, Plott CB, Piperidis N, Pompidor N, Llaca V, Metcalfe CJ, Doležel J, Cápal P, Carlson JW, Hoarau JY, Hervouet C, Zini C, Dievart A, Lipzen A, Williams M, Boston LB, Webber J, Keymanesh K, Tejomurthula S, Rajasekar S, Suchecki R, Furtado A, May G, Parakkal P, Simmons BA, Barry K, Henry RJ, Grimwood J, Aitken KS, Schmutz J, D'Hont A. The complex polyploid genome architecture of sugarcane. Nature 2024; 628:804-810. [PMID: 38538783 PMCID: PMC11041754 DOI: 10.1038/s41586-024-07231-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 02/23/2024] [Indexed: 04/06/2024]
Abstract
Sugarcane, the world's most harvested crop by tonnage, has shaped global history, trade and geopolitics, and is currently responsible for 80% of sugar production worldwide1. While traditional sugarcane breeding methods have effectively generated cultivars adapted to new environments and pathogens, sugar yield improvements have recently plateaued2. The cessation of yield gains may be due to limited genetic diversity within breeding populations, long breeding cycles and the complexity of its genome, the latter preventing breeders from taking advantage of the recent explosion of whole-genome sequencing that has benefited many other crops. Thus, modern sugarcane hybrids are the last remaining major crop without a reference-quality genome. Here we take a major step towards advancing sugarcane biotechnology by generating a polyploid reference genome for R570, a typical modern cultivar derived from interspecific hybridization between the domesticated species (Saccharum officinarum) and the wild species (Saccharum spontaneum). In contrast to the existing single haplotype ('monoploid') representation of R570, our 8.7 billion base assembly contains a complete representation of unique DNA sequences across the approximately 12 chromosome copies in this polyploid genome. Using this highly contiguous genome assembly, we filled a previously unsized gap within an R570 physical genetic map to describe the likely causal genes underlying the single-copy Bru1 brown rust resistance locus. This polyploid genome assembly with fine-grain descriptions of genome architecture and molecular targets for biotechnology will help accelerate molecular and transgenic breeding and adaptation of sugarcane to future environmental conditions.
Collapse
Affiliation(s)
- A L Healey
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| | - O Garsmeur
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - J T Lovell
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - S Shengquiang
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - A Sreedasyam
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - J Jenkins
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - C B Plott
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - N Piperidis
- Sugar Research Australia, Te Kowai, Queensland, Australia
| | - N Pompidor
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - V Llaca
- Corteva Agriscience, Johnston, IA, USA
| | - C J Metcalfe
- CSIRO Agriculture and Food, Queensland Bioscience Precinct, St Lucia, Queensland, Australia
| | - J Doležel
- Institute of Experimental Botany of the Czech Academy of Sciences, Centre of Plant Structural and Functional Genomics, Olomouc, Czech Republic
| | - P Cápal
- Institute of Experimental Botany of the Czech Academy of Sciences, Centre of Plant Structural and Functional Genomics, Olomouc, Czech Republic
| | - J W Carlson
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - J Y Hoarau
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
- ERCANE, Sainte-Clotilde, La Réunion, France
| | - C Hervouet
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - C Zini
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - A Dievart
- CIRAD, UMR AGAP Institut, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - A Lipzen
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - M Williams
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - L B Boston
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - J Webber
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - K Keymanesh
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - S Tejomurthula
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - S Rajasekar
- Arizona Genomics Institute, University of Arizona, Tucson, AZ, USA
| | - R Suchecki
- CSIRO Agriculture and Food, Urrbrae, South Australia, Australia
| | - A Furtado
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - G May
- Corteva Agriscience, Johnston, IA, USA
| | | | - B A Simmons
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
- Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA
| | - K Barry
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - R J Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Queensland, Brisbane, Queensland, Australia
| | - J Grimwood
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - K S Aitken
- CSIRO Agriculture and Food, Queensland Bioscience Precinct, St Lucia, Queensland, Australia
| | - J Schmutz
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - A D'Hont
- CIRAD, UMR AGAP Institut, Montpellier, France.
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France.
| |
Collapse
|
11
|
Chen M, Li L, Wang S, Wang P, Li Y. Transcriptome sequencing and screening of genes related to the MADS-box gene family in Clematis courtoisii. PLoS One 2024; 19:e0294426. [PMID: 38315679 PMCID: PMC10843124 DOI: 10.1371/journal.pone.0294426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 10/31/2023] [Indexed: 02/07/2024] Open
Abstract
The MADS-box gene family controls plant flowering and floral organ development; therefore, it is particularly important in ornamental plants. To investigate the genes associated with the MADS-box family in Clematis courtoisii, we performed full-length transcriptome sequencing on C. courtoisii using the PacBio Sequel third-generation sequencing platform, as no reference genome data was available. A total of 12.38 Gb of data, containing 9,476,585 subreads and 50,439 Unigenes were obtained. According to functional annotation, a total of 37,923 Unigenes (75.18% of the total) were assigned with functional annotations, and 50 Unigenes were identified as MADS-box related genes. Subsequently, we employed hmmerscan to perform protein sequence similarity search for the translated Unigene sequences and successfully identified 19 Unigenes associated with the MADS-box gene family, including MIKC*(1) and MIKCC (18) genes. Furthermore, within the MIKCC group, six subclasses can be further distinguished.
Collapse
Affiliation(s)
- Mingjian Chen
- Department of Ornamental Plant Research Center, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Linfang Li
- Department of Ornamental Plant Research Center, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Shu’an Wang
- Department of Ornamental Plant Research Center, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Peng Wang
- Department of Ornamental Plant Research Center, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| | - Ya Li
- Department of Ornamental Plant Research Center, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, China
| |
Collapse
|
12
|
Layoun P, López-Pérez M, Haro-Moreno JM, Haber M, Thrash JC, Henson MW, Kavagutti VS, Ghai R, Salcher MM. Flexible genomic island conservation across freshwater and marine Methylophilaceae. THE ISME JOURNAL 2024; 18:wrad036. [PMID: 38365254 PMCID: PMC10872708 DOI: 10.1093/ismejo/wrad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 12/15/2023] [Accepted: 12/20/2023] [Indexed: 02/18/2024]
Abstract
The evolutionary trajectory of Methylophilaceae includes habitat transitions from freshwater sediments to freshwater and marine pelagial that resulted in genome reduction (genome-streamlining) of the pelagic taxa. However, the extent of genetic similarities in the genomic structure and microdiversity of the two genome-streamlined pelagic lineages (freshwater "Ca. Methylopumilus" and the marine OM43 lineage) has so far never been compared. Here, we analyzed complete genomes of 91 "Ca. Methylopumilus" strains isolated from 14 lakes in Central Europe and 12 coastal marine OM43 strains. The two lineages showed a remarkable niche differentiation with clear species-specific differences in habitat preference and seasonal distribution. On the other hand, we observed a synteny preservation in their genomes by having similar locations and types of flexible genomic islands (fGIs). Three main fGIs were identified: a replacement fGI acting as phage defense, an additive fGI harboring metabolic and resistance-related functions, and a tycheposon containing nitrogen-, thiamine-, and heme-related functions. The fGIs differed in relative abundances in metagenomic datasets suggesting different levels of variability ranging from strain-specific to population-level adaptations. Moreover, variations in one gene seemed to be responsible for different growth at low substrate concentrations and a potential biogeographic separation within one species. Our study provides a first insight into genomic microdiversity of closely related taxa within the family Methylophilaceae and revealed remarkably similar dynamics involving mobile genetic elements and recombination between freshwater and marine family members.
Collapse
Affiliation(s)
- Paul Layoun
- Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre CAS, 37005 Ceske Budejovice, Czech Republic
- Faculty of Science, University of South Bohemia, 37005 Ceske Budejovice, Czech Republic
| | - Mario López-Pérez
- Evolutionary Genomics Group, División de Microbiología, Universidad Miguel Hernández, 03550 San Juan de Alicante, Spain
| | - Jose M Haro-Moreno
- Evolutionary Genomics Group, División de Microbiología, Universidad Miguel Hernández, 03550 San Juan de Alicante, Spain
| | - Markus Haber
- Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre CAS, 37005 Ceske Budejovice, Czech Republic
| | - J Cameron Thrash
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Michael W Henson
- Department of Geophysical Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Vinicius Silva Kavagutti
- Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre CAS, 37005 Ceske Budejovice, Czech Republic
- Faculty of Science, University of South Bohemia, 37005 Ceske Budejovice, Czech Republic
| | - Rohit Ghai
- Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre CAS, 37005 Ceske Budejovice, Czech Republic
| | - Michaela M Salcher
- Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre CAS, 37005 Ceske Budejovice, Czech Republic
| |
Collapse
|
13
|
Cochetel N, Minio A, Guarracino A, Garcia JF, Figueroa-Balderas R, Massonnet M, Kasuga T, Londo JP, Garrison E, Gaut BS, Cantu D. A super-pangenome of the North American wild grape species. Genome Biol 2023; 24:290. [PMID: 38111050 PMCID: PMC10729490 DOI: 10.1186/s13059-023-03133-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/30/2023] [Indexed: 12/20/2023] Open
Abstract
BACKGROUND Capturing the genetic diversity of wild relatives is crucial for improving crops because wild species are valuable sources of agronomic traits that are essential to enhance the sustainability and adaptability of domesticated cultivars. Genetic diversity across a genus can be captured in super-pangenomes, which provide a framework for interpreting genomic variations. RESULTS Here we report the sequencing, assembly, and annotation of nine wild North American grape genomes, which are phased and scaffolded at chromosome scale. We generate a reference-unbiased super-pangenome using pairwise whole-genome alignment methods, revealing the extent of the genomic diversity among wild grape species from sequence to gene level. The pangenome graph captures genomic variation between haplotypes within a species and across the different species, and it accurately assesses the similarity of hybrids to their parents. The species selected to build the pangenome are a great representation of the genus, as illustrated by capturing known allelic variants in the sex-determining region and for Pierce's disease resistance loci. Using pangenome-wide association analysis, we demonstrate the utility of the super-pangenome by effectively mapping short reads from genus-wide samples and identifying loci associated with salt tolerance in natural populations of grapes. CONCLUSIONS This study highlights how a reference-unbiased super-pangenome can reveal the genetic basis of adaptive traits from wild relatives and accelerate crop breeding research.
Collapse
Affiliation(s)
- Noé Cochetel
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | - Andrea Minio
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Jadran F Garcia
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | | | - Mélanie Massonnet
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | - Takao Kasuga
- Crops Pathology and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Davis, CA, USA
| | - Jason P Londo
- Horticulture Section, School of Integrative Plant Science, Cornell AgriTech, Cornell University, Geneva, NY, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Brandon S Gaut
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA, USA
| | - Dario Cantu
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA.
- Genome Center, University of California Davis, Davis, CA, USA.
| |
Collapse
|
14
|
Dougan KE, Deng ZL, Wöhlbrand L, Reuse C, Bunk B, Chen Y, Hartlich J, Hiller K, John U, Kalvelage J, Mansky J, Neumann-Schaal M, Overmann J, Petersen J, Sanchez-Garcia S, Schmidt-Hohagen K, Shah S, Spröer C, Sztajer H, Wang H, Bhattacharya D, Rabus R, Jahn D, Chan CX, Wagner-Döbler I. Multi-omics analysis reveals the molecular response to heat stress in a "red tide" dinoflagellate. Genome Biol 2023; 24:265. [PMID: 37996937 PMCID: PMC10666404 DOI: 10.1186/s13059-023-03107-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 11/10/2023] [Indexed: 11/25/2023] Open
Abstract
BACKGROUND "Red tides" are harmful algal blooms caused by dinoflagellate microalgae that accumulate toxins lethal to other organisms, including humans via consumption of contaminated seafood. These algal blooms are driven by a combination of environmental factors including nutrient enrichment, particularly in warm waters, and are increasingly frequent. The molecular, regulatory, and evolutionary mechanisms that underlie the heat stress response in these harmful bloom-forming algal species remain little understood, due in part to the limited genomic resources from dinoflagellates, complicated by the large sizes of genomes, exhibiting features atypical of eukaryotes. RESULTS We present the de novo assembled genome (~ 4.75 Gbp with 85,849 protein-coding genes), transcriptome, proteome, and metabolome from Prorocentrum cordatum, a globally abundant, bloom-forming dinoflagellate. Using axenic algal cultures, we study the molecular mechanisms that underpin the algal response to heat stress, which is relevant to current ocean warming trends. We present the first evidence of a complementary interplay between RNA editing and exon usage that regulates the expression and functional diversity of biomolecules, reflected by reduction in photosynthesis, central metabolism, and protein synthesis. These results reveal genomic signatures and post-transcriptional regulation for the first time in a pelagic dinoflagellate. CONCLUSIONS Our multi-omics analyses uncover the molecular response to heat stress in an important bloom-forming algal species, which is driven by complex gene structures in a large, high-G+C genome, combined with multi-level transcriptional regulation. The dynamics and interplay of molecular regulatory mechanisms may explain in part how dinoflagellates diversified to become some of the most ecologically successful organisms on Earth.
Collapse
Affiliation(s)
- Katherine E Dougan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Zhi-Luo Deng
- Helmholtz-Center for Infection Research (HZI), Inhoffenstraße 7, Braunschweig, 38124, Germany
| | - Lars Wöhlbrand
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, 26129, Oldenburg, Germany
| | - Carsten Reuse
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Boyke Bunk
- German Culture Collection for Microorganisms and Cell Cultures (DSMZ), Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Yibi Chen
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Juliane Hartlich
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Karsten Hiller
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Uwe John
- Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Am Handelshafen 12, 27570, Bremerhaven, Germany
- Helmholtz Institute for Functional Marine Biodiversity at the University of Oldenburg (HIFMB), Ammerländer Heerstraße 231, 26129, Oldenburg, Germany
| | - Jana Kalvelage
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, 26129, Oldenburg, Germany
| | - Johannes Mansky
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Meina Neumann-Schaal
- German Culture Collection for Microorganisms and Cell Cultures (DSMZ), Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Jörg Overmann
- German Culture Collection for Microorganisms and Cell Cultures (DSMZ), Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Jörn Petersen
- German Culture Collection for Microorganisms and Cell Cultures (DSMZ), Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Selene Sanchez-Garcia
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Kerstin Schmidt-Hohagen
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Sarah Shah
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Cathrin Spröer
- German Culture Collection for Microorganisms and Cell Cultures (DSMZ), Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Helena Sztajer
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Hui Wang
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Ralf Rabus
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, 26129, Oldenburg, Germany
| | - Dieter Jahn
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany
| | - Cheong Xin Chan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia.
| | - Irene Wagner-Döbler
- Braunschweig Center for Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106, Brunswick, Germany.
| |
Collapse
|
15
|
Murga-Moreno J, Casillas S, Barbadilla A, Uricchio L, Enard D. An efficient and robust ABC approach to infer the rate and strength of adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.29.555322. [PMID: 37693550 PMCID: PMC10491248 DOI: 10.1101/2023.08.29.555322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in non-model species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to non-model genomes. We apply ABC-MK to the human proteome and a set of known Virus Interacting Proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Collapse
Affiliation(s)
- Jesús Murga-Moreno
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| | - Sònia Casillas
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | - Antonio Barbadilla
- Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
- Institute of Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona 08193, Spain
| | | | - David Enard
- University of Arizona Department of Ecology and Evolutionary Biology, Tucson, USA
| |
Collapse
|
16
|
Bhardwaj V, Singh A, Choudhary A, Dalavi R, Ralte L, Chawngthu RL, Senthil Kumar N, Vijay N, Chande A. HIV-1 Vpr induces ciTRAN to prevent transcriptional repression of the provirus. SCIENCE ADVANCES 2023; 9:eadh9170. [PMID: 37672576 PMCID: PMC10482341 DOI: 10.1126/sciadv.adh9170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/02/2023] [Indexed: 09/08/2023]
Abstract
The functional consequences of circular RNA (circRNA) expression on HIV-1 replication are largely unknown. Using a customized protocol involving direct RNA nanopore sequencing, here, we captured circRNAs from HIV-1-infected T cells and identified ciTRAN, a circRNA that modulates HIV-1 transcription. We found that HIV-1 infection induces ciTRAN expression in a Vpr-dependent manner and that ciTRAN interacts with SRSF1, a protein known to repress HIV-1 transcription. Our results suggest that HIV-1 hijacks ciTRAN to exclude serine/arginine-rich splicing factor 1 (SRSF1) from the viral transcriptional complex, thereby promoting efficient viral transcription. In addition, we demonstrate that an SRSF1-inspired mimic can inhibit viral transcription regardless of ciTRAN induction. The hijacking of a host circRNA thus represents a previously unknown facet of primate lentiviruses in overcoming transmission bottlenecks.
Collapse
Affiliation(s)
- Vipin Bhardwaj
- Molecular Virology Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Bhopal, India
| | - Aman Singh
- Molecular Virology Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Bhopal, India
| | - Aditi Choudhary
- Molecular Virology Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Bhopal, India
| | - Rishikesh Dalavi
- Molecular Virology Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Bhopal, India
| | | | | | | | - Nagarjun Vijay
- Computational and Evolutionary Genomics Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Bhopal, India
| | - Ajit Chande
- Molecular Virology Laboratory, Department of Biological Sciences, Indian Institute of Science Education and Research (IISER), Bhopal, India
| |
Collapse
|
17
|
Taj B, Adeolu M, Xiong X, Ang J, Nursimulu N, Parkinson J. MetaPro: a scalable and reproducible data processing and analysis pipeline for metatranscriptomic investigation of microbial communities. MICROBIOME 2023; 11:143. [PMID: 37370188 DOI: 10.1186/s40168-023-01562-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 04/28/2023] [Indexed: 06/29/2023]
Abstract
BACKGROUND Whole microbiome RNASeq (metatranscriptomics) has emerged as a powerful technology to functionally interrogate microbial communities. A key challenge is how best to process, analyze, and interpret these complex datasets. In a typical application, a single metatranscriptomic dataset may comprise from tens to hundreds of millions of sequence reads. These reads must first be processed and filtered for low quality and potential contaminants, before being annotated with taxonomic and functional labels and subsequently collated to generate global bacterial gene expression profiles. RESULTS Here, we present MetaPro, a flexible, massively scalable metatranscriptomic data analysis pipeline that is cross-platform compatible through its implementation within a Docker framework. MetaPro starts with raw sequence read input (single-end or paired-end reads) and processes them through a tiered series of filtering, assembly, and annotation steps. In addition to yielding a final list of bacterial genes and their relative expression, MetaPro delivers a taxonomic breakdown based on the consensus of complementary prediction algorithms, together with a focused breakdown of enzymes, readily visualized through the Cytoscape network visualization tool. We benchmark the performance of MetaPro against two current state-of-the-art pipelines and demonstrate improved performance and functionality. CONCLUSIONS MetaPro represents an effective integrated solution for the processing and analysis of metatranscriptomic datasets. Its modular architecture allows new algorithms to be deployed as they are developed, ensuring its longevity. To aid user uptake of the pipeline, MetaPro, together with an established tutorial that has been developed for educational purposes, is made freely available at https://github.com/ParkinsonLab/MetaPro . The software is freely available under the GNU general public license v3. Video Abstract.
Collapse
Affiliation(s)
- Billy Taj
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Mobolaji Adeolu
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Xuejian Xiong
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
| | - Jordan Ang
- Department of Chemical and Physical Sciences, University of Toronto, Mississauga, ON, L5L 1C6, Canada
| | - Nirvana Nursimulu
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 3G4, Canada
| | - John Parkinson
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3G4, Canada.
- Department of Biochemistry, University of Toronto, Toronto, ON, M5S 3G4, Canada.
| |
Collapse
|
18
|
Bortoletto E, Pieretti F, Brun P, Venier P, Leonardi A, Rosani U. Meta-Analysis of Keratoconus Transcriptomic Data Revealed Altered RNA Editing Levels Impacting Keratin Genomic Clusters. Invest Ophthalmol Vis Sci 2023; 64:12. [PMID: 37279397 DOI: 10.1167/iovs.64.7.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023] Open
Abstract
Introduction Keratoconus (KC) is an ocular disorder with a multifactorial origin. Transcriptomic analyses (RNA-seq) revealed deregulations of coding (mRNA) and non-coding RNAs (ncRNAs) in KC, suggesting that mRNA-ncRNA co-regulations can promote the onset of KC. The present study investigates the modulation of RNA editing mediated by the adenosine deaminase acting on dsRNA (ADAR) enzyme in KC. Materials The level of ADAR-mediated RNA editing in KC and healthy corneas were determined by two indexes in two different sequencing datasets. REDIportal was used to localize known editing sites, whereas new putative sites were de novo identified in the most extended dataset only and their possible impact was evaluated. Western Blot analysis was used to measure the level of ADAR1 in the cornea from independent samples. Results KC was characterized by a statistically significant lower RNA-editing level compared to controls, resulting in a lower editing frequency, and less edited bases. The distribution of the editing sites along the human genome showed considerable differences between groups, particularly relevant in the chromosome 12 regions encoding for Keratin type II cluster. A total of 32 recoding sites were characterized, 17 representing novel sites. JUP, KRT17, KRT76, and KRT79 were edited with higher frequencies in KC than in controls, whereas BLCAP, COG3, KRT1, KRT75, and RRNAD1 were less edited. Both gene expression and protein levels of ADAR1 appeared not regulated between diseased and controls. Conclusions Our findings demonstrated an altered RNA-editing in KC possibly linked to the peculiar cellular conditions. The functional implications should be further investigated.
Collapse
Affiliation(s)
| | - Fabio Pieretti
- Department of Molecular Medicine, Histology Unit, University of Padova, Padova, Italy
| | - Paola Brun
- Department of Molecular Medicine, Histology Unit, University of Padova, Padova, Italy
| | - Paola Venier
- Department of Biology, University of Padova, Padova, Italy
| | - Andrea Leonardi
- Department of Neuroscience, Ophthalmology Unit, University of Padova, Padova, Italy
| | - Umberto Rosani
- Department of Biology, University of Padova, Padova, Italy
| |
Collapse
|
19
|
Wang X, Zhu L, Ying S, Liao X, Zheng J, Liu Z, Gao J, Niu M, Xu X, Zhou Z, Xu H, Wu J. Increased RNA editing sites revealed as potential novel biomarkers for diagnosis in primary Sjögren's syndrome. J Autoimmun 2023; 138:103035. [PMID: 37216868 DOI: 10.1016/j.jaut.2023.103035] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/16/2023] [Accepted: 03/20/2023] [Indexed: 05/24/2023]
Abstract
BACKGROUND Transcriptome-wide aberrant RNA editing has been shown to contribute to autoimmune diseases, but its extent and significance in primary Sjögren's syndrome (pSS) are currently poorly understood. METHODS We systematically characterized the global pattern and clinical relevance of RNA editing in pSS by performing large-scale RNA sequencing of minor salivary gland tissues obtained from 439 pSS patients and 130 non-pSS or healthy controls. FINDINGS Compared with controls, pSS patients displayed increased global RNA-editing levels, which were significantly correlated and clinically relevant to various immune features in pSS. The elevated editing levels were likely explained by significantly increased expression of adenosine deaminase acting on RNA 1 (ADAR1) p150 in pSS, which was associated with disease features. In addition, genome-wide differential RNA editing (DRE) analysis between pSS and non-pSS showed that most (249/284) DRE sites were hyper-edited in pSS, especially the top 10 DRE sites dominated by hyper-edited sites and assigned to nine unique genes involved in the inflammatory response or immune system. Interestingly, among all DRE sites, six RNA editing sites were only detected in pSS and resided in three unique genes (NLRC5, IKZF3 and JAK3). Furthermore, these six specific DRE sites with significant clinical relevance in pSS showed a strong capacity to distinguish between pSS and non-pSS, reflecting powerful diagnostic efficacy and accuracy. CONCLUSION These findings reveal the potential role of RNA editing in contributing to the risk of pSS and further highlight the important prognostic value and diagnostic potential of RNA editing in pSS.
Collapse
Affiliation(s)
- Xiaobing Wang
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Lingxiao Zhu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Senhong Ying
- Precision Medicine Center, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xin Liao
- The Center for Clinical Molecular Medical Detection, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Junjie Zheng
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Zhenwei Liu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Jianxia Gao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China
| | - Miaomiao Niu
- Ningbo Health Gene Technologies Co, Ningbo, China
| | - Xin Xu
- Shandong Cancer Hospital and Institute, Jinan, China
| | - Zihao Zhou
- Department of Clinical Laboratory, The Third People's Hospital of Shenzhen, Southern University of Science and Technology, National Clinical Research Center for Infectious Diseases, Shenzhen, China
| | - Huji Xu
- Department of Rheumatology and Immunology, Shanghai Changzheng Hospital, Naval Medical University, Shanghai, China; Peking-Tsinghua Center for Life Sciences, Tsinghua University, Beijing, China; School of Clinical Medicine, Tsinghua University, Beijing, China.
| | - Jinyu Wu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
20
|
Chakrabarti AM, Iosub IA, Lee FCY, Ule J, Luscombe NM. A computationally-enhanced hiCLIP atlas reveals Staufen1-RNA binding features and links 3' UTR structure to RNA metabolism. Nucleic Acids Res 2023; 51:3573-3589. [PMID: 37013995 PMCID: PMC10164587 DOI: 10.1093/nar/gkad221] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 02/08/2023] [Accepted: 03/31/2023] [Indexed: 04/05/2023] Open
Abstract
The structure of mRNA molecules plays an important role in its interactions with trans-acting factors, notably RNA binding proteins (RBPs), thus contributing to the functional consequences of this interplay. However, current transcriptome-wide experimental methods to chart these interactions are limited by their poor sensitivity. Here we extend the hiCLIP atlas of duplexes bound by Staufen1 (STAU1) ∼10-fold, through careful consideration of experimental assumptions, and the development of bespoke computational methods which we apply to existing data. We present Tosca, a Nextflow computational pipeline for the processing, analysis and visualisation of proximity ligation sequencing data generally. We use our extended duplex atlas to discover insights into the RNA selectivity of STAU1, revealing the importance of structural symmetry and duplex-span-dependent nucleotide composition. Furthermore, we identify heterogeneity in the relationship between transcripts with STAU1-bound 3' UTR duplexes and metabolism of the associated RNAs that we relate to RNA structure: transcripts with short-range proximal 3' UTR duplexes have high degradation rates, but those with long-range duplexes have low rates. Overall, our work enables the integrative analysis of proximity ligation data delivering insights into specific features and effects of RBP-RNA structure interactions.
Collapse
Affiliation(s)
| | - Ira A Iosub
- The Francis Crick Institute, London, NW1 4AT, UK
| | - Flora C Y Lee
- The Francis Crick Institute, London, NW1 4AT, UK
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, WC1N 3BG, UK
| | - Jernej Ule
- The Francis Crick Institute, London, NW1 4AT, UK
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, WC1N 3BG, UK
- UK Dementia Research Institute at King's College London, Institute of Psychiatry, Psychology and Neuroscience, London, SE5 9RX, UK
| | - Nicholas M Luscombe
- The Francis Crick Institute, London, NW1 4AT, UK
- Okinawa Institute of Science and Technology Graduate University, Onna-son, Okinawa904-0495, Japan
| |
Collapse
|
21
|
Li Y, Shi X, Zuo Y, Li T, Liu L, Shen Z, Shen J, Zhang R, Wang S. Multiplexed Target Enrichment Enables Efficient and In-Depth Analysis of Antimicrobial Resistome in Metagenomes. Microbiol Spectr 2022; 10:e0229722. [PMID: 36287061 PMCID: PMC9769626 DOI: 10.1128/spectrum.02297-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 10/04/2022] [Indexed: 01/06/2023] Open
Abstract
Antibiotic resistance genes (ARGs) pose a serious threat to public health and ecological security in the 21st century. However, the resistome only accounts for a tiny fraction of metagenomic content, which makes it difficult to investigate low-abundance ARGs in various environmental settings. Thus, a highly sensitive, accurate, and comprehensive method is needed to describe ARG profiles in complex metagenomic samples. In this study, we established a high-throughput sequencing method based on targeted amplification, which could simultaneously detect ARGs (n = 251), mobile genetic element genes (n = 8), and metal resistance genes (n = 19) in metagenomes. The performance of amplicon sequencing was compared with traditional metagenomic shotgun sequencing (MetaSeq). A total of 1421 primer pairs were designed, achieving extremely high coverage of target genes. The amplicon sequencing significantly improved the recovery of target ARGs (~9 × 104-fold), with higher sensitivity and diversity, less cost, and computation burden. Furthermore, targeted enrichment allows deep scanning of single nucleotide polymorphisms (SNPs), and elevated SNPs detection was shown in this study. We further performed this approach for 48 environmental samples (37 feces, 20 soils, and 7 sewage) and 16 clinical samples. All samples tested in this study showed high diversity and recovery of targeted genes. Our results demonstrated that the approach could be applied to various metagenomic samples and served as an efficient tool in the surveillance and evolution assessment of ARGs. Access to the resistome using the enrichment method validated in this study enabled the capture of low-abundance resistomes while being less costly and time-consuming, which can greatly advance our understanding of local and global resistome dynamics. IMPORTANCE ARGs, an increasing global threat to human health, can be transferred into health-related microorganisms in the environment by horizontal gene transfer, posing a serious threat to public health. Advancing profiling methods are needed for monitoring and predicting the potential risks of ARGs in metagenomes. Our study described a customized amplicon sequencing assay that could enable a high-throughput, targeted, in-depth analysis of ARGs and detect a low-abundance portion of resistomes. This method could serve as an efficient tool to assess the variation and evolution of specific ARGs in the clinical and natural environment.
Collapse
Affiliation(s)
- Yiming Li
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Xiaomin Shi
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Yang Zuo
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Tian Li
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Lu Liu
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Zhangqi Shen
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Jianzhong Shen
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| | - Rong Zhang
- The Second Affiliated Hospital of Zhejiang University, Zhejiang University, Hangzhou, China
| | - Shaolin Wang
- Beijing Key Laboratory of Detection Technology for Animal-Derived Food Safety, College of Veterinary Medicine, China Agricultural University, Beijing, China
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, China
| |
Collapse
|
22
|
Etherington GJ, Nash W, Ciezarek A, Mehta TK, Barria A, Peñaloza C, Khan MGQ, Durrant A, Forrester N, Fraser F, Irish N, Kaithakottil GG, Lipscombe J, Trong T, Watkins C, Swarbreck D, Angiolini E, Cnaani A, Gharbi K, Houston RD, Benzie JAH, Haerty W. Chromosome-level genome sequence of the Genetically Improved Farmed Tilapia (GIFT, Oreochromis niloticus) highlights regions of introgression with O. mossambicus. BMC Genomics 2022; 23:832. [PMID: 36522771 PMCID: PMC9756657 DOI: 10.1186/s12864-022-09065-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The Nile tilapia (Oreochromis niloticus) is the third most important freshwater fish for aquaculture. Its success is directly linked to continuous breeding efforts focusing on production traits such as growth rate and weight. Among those elite strains, the Genetically Improved Farmed Tilapia (GIFT) programme initiated by WorldFish is now distributed worldwide. To accelerate the development of the GIFT strain through genomic selection, a high-quality reference genome is necessary. RESULTS Using a combination of short (10X Genomics) and long read (PacBio HiFi, PacBio CLR) sequencing and a genetic map for the GIFT strain, we generated a chromosome level genome assembly for the GIFT. Using genomes of two closely related species (O. mossambicus, O. aureus), we characterised the extent of introgression between these species and O. niloticus that has occurred during the breeding process. Over 11 Mb of O. mossambicus genomic material could be identified within the GIFT genome, including genes associated with immunity but also with traits of interest such as growth rate. CONCLUSION Because of the breeding history of elite strains, current reference genomes might not be the most suitable to support further studies into the GIFT strain. We generated a chromosome level assembly of the GIFT strain, characterising its mixed origins, and the potential contributions of introgressed regions to selected traits.
Collapse
Affiliation(s)
- G J Etherington
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - W Nash
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - A Ciezarek
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - T K Mehta
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - A Barria
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - C Peñaloza
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - M G Q Khan
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Department of Fisheries Biology and Genetics, Bangladesh Agricultural University, Mymensingh, 2202, Bangladesh
| | - A Durrant
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - N Forrester
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - F Fraser
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - N Irish
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - G G Kaithakottil
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - J Lipscombe
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - T Trong
- WorldFish, 10670, Penang, Malaysia
| | - C Watkins
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - D Swarbreck
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - E Angiolini
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - A Cnaani
- Department of Poultry and Aquaculture, Institute of Animal Science, Agricultural Research Organization - Volcani Institute, Rishon LeTsiyon, Israel
| | - K Gharbi
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - R D Houston
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Benchmark Genetics, 1 Pioneer Building, Edinburgh Technopole, Penicuik, EH26 0GB, UK
| | | | - W Haerty
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK.
- School of Biological Sciences, University of East Anglia, Norwich, UK.
| |
Collapse
|
23
|
Sim JXF, Drigo B, Doolette CL, Vasileiadis S, Karpouzas DG, Lombi E. Impact of twenty pesticides on soil carbon microbial functions and community composition. CHEMOSPHERE 2022; 307:135820. [PMID: 35944675 DOI: 10.1016/j.chemosphere.2022.135820] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 07/20/2022] [Accepted: 07/21/2022] [Indexed: 05/20/2023]
Abstract
Pesticides are known to affect non-targeted soil microorganisms. Still, studies comparing the effect of multiple pesticides on a wide range of microbial endpoints associated with carbon cycling are scarce. Here, we employed fluorescence enzymatic assay and real-time PCR to evaluate the effect of 20 commercial pesticides, applied at their recommended dose and five times their recommended dose, on soil carbon cycling related enzymatic activities (α-1,4-glucosidase, β-1,4-glucosidase, β-d-cellobiohydrolase and β-xylosidase), and on the absolute abundance of functional genes (cbhl and chiA), in three different South Australian agricultural soils. The effects on cellulolytic and chitinolytic microorganisms, and the total microbial community composition were determined using shotgun metagenomic sequencing in selected pesticide-treated and untreated samples. The application of insecticides significantly increased the cbhl and chiA genes absolute abundance in the acidic soil. At the community level, insecticide fipronil had the greatest stimulating effect on cellulolytic and chitinolytic microorganisms, followed by fungicide metalaxyl-M and insecticide imidacloprid. A shift towards a fungal dominated microbial community was observed in metalaxyl-M treated soil. Overall, our results suggest that the application of pesticides might affect the soil carbon cycle and may disrupt the formation of soil organic matter and structure stabilisation.
Collapse
Affiliation(s)
- Jowenna X F Sim
- Future Industries Institute, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| | - Barbara Drigo
- Future Industries Institute, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Casey L Doolette
- Future Industries Institute, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Sotirios Vasileiadis
- University of Thessaly, Department of Biochemistry and Biotechnology, Laboratory of Plant and Environmental Biotechnology, Larissa, Viopolis, 41500, Greece
| | - Dimitrios G Karpouzas
- University of Thessaly, Department of Biochemistry and Biotechnology, Laboratory of Plant and Environmental Biotechnology, Larissa, Viopolis, 41500, Greece
| | - Enzo Lombi
- Future Industries Institute, University of South Australia, Mawson Lakes, SA, 5095, Australia; University of South Australia, UniSA STEM, Mawson Lakes, South Australia, 5095, Australia
| |
Collapse
|
24
|
Draft genome of the bluefin tuna blood fluke, Cardicola forsteri. PLoS One 2022; 17:e0276287. [PMID: 36240154 PMCID: PMC9565688 DOI: 10.1371/journal.pone.0276287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 10/03/2022] [Indexed: 11/12/2022] Open
Abstract
The blood fluke Cardicola forsteri (Trematoda: Aporocotylidae) is a pathogen of ranched bluefin tuna in Japan and Australia. Genomics of Cardicola spp. have thus far been limited to molecular phylogenetics of select gene sequences. In this study, sequencing of the C. forsteri genome was performed using Illumina short-read and Oxford Nanopore long-read technologies. The sequences were assembled de novo using a hybrid of short and long reads, which produced a high-quality contig-level assembly (N50 > 430 kb and L50 = 138). The assembly was also relatively complete and unfragmented, comprising 66% and 7.2% complete and fragmented metazoan Benchmarking Universal Single-Copy Orthologs (BUSCOs), respectively. A large portion (> 55%) of the genome was made up of intergenic repetitive elements, primarily long interspersed nuclear elements (LINEs), while protein-coding regions cover > 6%. Gene prediction identified 8,564 hypothetical polypeptides, > 77% of which are homologous to published sequences of other species. The identification of select putative proteins, including cathepsins, calpains, tetraspanins, and glycosyltransferases is discussed. This is the first genome assembly of any aporocotylid, a major step toward understanding of the biology of this family of fish blood flukes and their interactions within hosts.
Collapse
|
25
|
Bai Y, Gao X, Wang H, Ye L, Zhang X, Huang W, Long X, Yang K, Li G, Luo J, Wang J, Yu Y. Comparative mitogenome analysis reveals mitochondrial genome characteristics in eight strains of Beauveria. PeerJ 2022; 10:e14067. [PMID: 36193428 PMCID: PMC9526403 DOI: 10.7717/peerj.14067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 08/26/2022] [Indexed: 01/20/2023] Open
Abstract
Despite the significant progress that has been made in the genome sequencing of Beauveria species, mitochondrial genome (mitogenome) used to examine genetic diversity within fungal populations. Complete mitogenomes of Beauveria species can be easily sequenced and assembled using various sequencing techniques. However, since mitogenome annotations are mainly derived from similar species comparison and software prediction, and are not supported by RNA-seq transcripts data, it leads to problems with the accuracy of mitochondrial annotations and the inability to understand RNA processing. In this study, we assembled and annotated the mitogenome of eight Beauveria strains using Illumina DNA and RNA sequencing data. The circular mitogenome of eight Beauveria strains ranged from 26,850 bp (B. caledonica strain ATCC 64970) to 35,999 bp (B. brongniartii strain GYU-BMZ03), with the intronic insertions accounting for most of the size variation, thus contributing to a total mitochondrial genome (mitogenome) size of 7.01% and 28.95%, respectively. Intron number variations were not directly related to the evolutionary relationship distance. Besides ribosomal protein S3 (rps3), most introns are lost too quickly and lack the stability of protein-coding genes. The short RNA-seq reads from next-generation sequencing can improve the mitochondrial annotation accuracy and help study polycistronic transcripts and RNA processing. The transcription initiation sites may be located in the control region. Most introns do not serve as taxonomic markers and also lack open reading frames (ORFs). We assumed that the poly A tail was added to the polycistronic transcript before splicing and one polycistronic transcript (trnM (1)-trnL (1)-trnA-trnF-trnK-trnL (2)-trnQ-trnH-trnM (2)-nad2-nad3-atp9-cox2-trnR (1)-nad4L-nad5-cob-trnC-cox1-trnR (2)-nad1-nad4-atp8-atp6-rns-trnY-trnD-trnS-trnN-cox3-trnG-nad6-trnV-trnI-trnS-trnW-trnP-rnl(rps3)-trnT-trnE-trnM (3)) was first processed from the mitogenome and was subsequently processed into smaller mono-, di-, or tricistronic RNAs.
Collapse
Affiliation(s)
- Yu Bai
- College of Mathematics & Information Science, Guiyang University, Guiyang, China,Guangxi Key Laboratory of Biology for Crop Diseases and Insect Pests, Guangxi Academy of Agricultural Sciences, Nanning, China
| | - Xuyuan Gao
- Guangxi Key Laboratory of Biology for Crop Diseases and Insect Pests, Guangxi Academy of Agricultural Sciences, Nanning, China
| | - Hui Wang
- Guizhou Provincial Key Laboratory for Rare Animal and Economic Insects of the Mountainous Region, Guiyang University, Guiyang, China
| | - Lin Ye
- College of Biology and Environmental Engineering, Guiyang University, Guiyang, China
| | - Xianqun Zhang
- College of Biology and Environmental Engineering, Guiyang University, Guiyang, China
| | - Wei Huang
- College of Biology and Environmental Engineering, Guiyang University, Guiyang, China
| | - Xiuzhen Long
- Guangxi Key Laboratory of Biology for Crop Diseases and Insect Pests, Guangxi Academy of Agricultural Sciences, Nanning, China
| | - Kang Yang
- College of Biology and Environmental Engineering, Guiyang University, Guiyang, China
| | - Guoyong Li
- Guizhou Provincial Key Laboratory for Rare Animal and Economic Insects of the Mountainous Region, Guiyang University, Guiyang, China
| | - Jianlin Luo
- Guizhou Provincial Key Laboratory for Rare Animal and Economic Insects of the Mountainous Region, Guiyang University, Guiyang, China
| | - Jiyue Wang
- College of Biology and Environmental Engineering, Guiyang University, Guiyang, China
| | - Yonghao Yu
- Guangxi Key Laboratory of Biology for Crop Diseases and Insect Pests, Guangxi Academy of Agricultural Sciences, Nanning, China
| |
Collapse
|
26
|
ADAR2 Protein Is Associated with Overall Survival in GBM Patients and Its Decrease Triggers the Anchorage-Independent Cell Growth Signature. Biomolecules 2022; 12:biom12081142. [PMID: 36009036 PMCID: PMC9405742 DOI: 10.3390/biom12081142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 06/14/2022] [Accepted: 08/15/2022] [Indexed: 11/16/2022] Open
Abstract
Background: Epitranscriptomic mechanisms, such as A-to-I RNA editing mediated by ADAR deaminases, contribute to cancer heterogeneity and patients’ stratification. ADAR enzymes can change the sequence, structure, and expression of several RNAs, affecting cancer cell behavior. In glioblastoma, an overall decrease in ADAR2 RNA level/activity has been reported. However, no data on ADAR2 protein levels in GBM patient tissues are available; and most data are based on ADARs overexpression experiments. Methods: We performed IHC analysis on GBM tissues and correlated ADAR2 levels and patients’ overall survival. We silenced ADAR2 in GBM cells, studied cell behavior, and performed a gene expression/editing analysis. Results: GBM tissues do not all show a low/no ADAR2 level, as expected by previous studies. Although, different amounts of ADAR2 protein were observed in different patients, with a low level correlating with a poor patient outcome. Indeed, reducing the endogenous ADAR2 protein in GBM cells promotes cell proliferation and migration and changes the cell’s program to an anchorage-independent growth mode. In addition, deep-seq data and bioinformatics analysis indicated multiple RNAs are differently expressed/edited upon siADAR2. Conclusion: ADAR2 protein is an important deaminase in GBM and its amount correlates with patient prognosis.
Collapse
|
27
|
Abstract
The recovery of DNA from viromes is a major obstacle in the use of long-read sequencing to study their genomes. For this reason, the use of cellular metagenomes (>0.2-μm size range) emerges as an interesting complementary tool, since they contain large amounts of naturally amplified viral genomes from prelytic replication. We have applied second-generation (Illumina NextSeq; short reads) and third-generation (PacBio Sequel II; long reads) sequencing to compare the diversity and features of the viral community in a marine sample obtained from offshore waters of the western Mediterranean. We found that a major wedge of the expected marine viral diversity was directly recovered by the raw PacBio circular consensus sequencing (CCS) reads. More than 30,000 sequences were detected only in this data set, with no homologues in the long- and short-read assembly, and ca. 26,000 had no homologues in the large data set of the Global Ocean Virome 2 (GOV2), highlighting the information gap created by the assembly bias. At the level of complete viral genomes, the performance was similar in both approaches. However, the hybrid long- and short-read assembly provided the longest average length of the sequences and improved the host assignment. Although no novel major clades of viruses were found, there was an increase in the intraclade genomic diversity recovered by long reads that produced an enriched assessment of the real diversity and allowed the discovery of novel genes with biotechnological potential (e.g., endolysin genes). IMPORTANCE We explored the vast genetic diversity of environmental viruses by using a combination of cellular metagenome (as opposed to virome) sequencing using high-fidelity long-read sequences (in this case, PacBio CCS). This approach resulted in the recovery of a representative sample of the viral population, and it performed better (more phage contigs, larger average contig size) than Illumina sequencing applied to the same sample. By this approach, the many biases of assembly are avoided, as the CCS reads recovers (typically around 5 kb) complete genes and even operons, resulting in a better discovery of the viral gene diversity based on viral marker proteins. Thus, biotechnologically promising genes, such as endolysin genes, can be very efficiently searched with this approach. In addition, hybrid assembly produces more complete and longer contigs, which is particularly important for studying little-known viral groups such as the nucleocytoplasmic large DNA viruses (NCLDV).
Collapse
|
28
|
Lo R, Dougan KE, Chen Y, Shah S, Bhattacharya D, Chan CX. Alignment-Free Analysis of Whole-Genome Sequences From Symbiodiniaceae Reveals Different Phylogenetic Signals in Distinct Regions. FRONTIERS IN PLANT SCIENCE 2022; 13:815714. [PMID: 35557718 PMCID: PMC9087856 DOI: 10.3389/fpls.2022.815714] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 04/04/2022] [Indexed: 05/24/2023]
Abstract
Dinoflagellates of the family Symbiodiniaceae are predominantly essential symbionts of corals and other marine organisms. Recent research reveals extensive genome sequence divergence among Symbiodiniaceae taxa and high phylogenetic diversity hidden behind subtly different cell morphologies. Using an alignment-free phylogenetic approach based on sub-sequences of fixed length k (i.e. k-mers), we assessed the phylogenetic signal among whole-genome sequences from 16 Symbiodiniaceae taxa (including the genera of Symbiodinium, Breviolum, Cladocopium, Durusdinium and Fugacium) and two strains of Polarella glacialis as outgroup. Based on phylogenetic trees inferred from k-mers in distinct genomic regions (i.e. repeat-masked genome sequences, protein-coding sequences, introns and repeats) and in protein sequences, the phylogenetic signal associated with protein-coding DNA and the encoded amino acids is largely consistent with the Symbiodiniaceae phylogeny based on established markers, such as large subunit rRNA. The other genome sequences (introns and repeats) exhibit distinct phylogenetic signals, supporting the expected differential evolutionary pressure acting on these regions. Our analysis of conserved core k-mers revealed the prevalence of conserved k-mers (>95% core 23-mers among all 18 genomes) in annotated repeats and non-genic regions of the genomes. We observed 180 distinct repeat types that are significantly enriched in genomes of the symbiotic versus free-living Symbiodinium taxa, suggesting an enhanced activity of transposable elements linked to the symbiotic lifestyle. We provide evidence that representation of alignment-free phylogenies as dynamic networks enhances the ability to generate new hypotheses about genome evolution in Symbiodiniaceae. These results demonstrate the potential of alignment-free phylogenetic methods as a scalable approach for inferring comprehensive, unbiased whole-genome phylogenies of dinoflagellates and more broadly of microbial eukaryotes.
Collapse
Affiliation(s)
- Rosalyn Lo
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Katherine E. Dougan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Yibi Chen
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Sarah Shah
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, United States
| | - Cheong Xin Chan
- Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
29
|
Magnier J, Druet T, Naves M, Ouvrard M, Raoul S, Janelle J, Moazami-Goudarzi K, Lesnoff M, Tillard E, Gautier M, Flori L. The genetic history of Mayotte and Madagascar cattle breeds mirrors the complex pattern of human exchanges in Western Indian Ocean. G3 GENES|GENOMES|GENETICS 2022; 12:6523972. [PMID: 35137043 PMCID: PMC8982424 DOI: 10.1093/g3journal/jkac029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 01/11/2022] [Indexed: 11/16/2022]
Abstract
Despite their central economic and cultural role, the origin of cattle populations living in Indian Ocean islands still remains poorly documented. Here, we unravel the demographic and adaptive histories of the extant Zebus from the Mayotte and Madagascar islands using high-density SNP genotyping data. We found that these populations are very closely related and both display a predominant indicine ancestry. They diverged in the 16th century at the arrival of European people who transformed the trade network in the area. Their common ancestral cattle population originates from an admixture between an admixed African zebu population and an Indian zebu that occurred around the 12th century at the time of the earliest contacts between human African populations of the Swahili corridor and Austronesian people from Southeast Asia in Comoros and Madagascar. A steep increase in the estimated population sizes from the beginning of the 16th to the 17th century coincides with the expansion of the cattle trade. By carrying out genome scans for recent selection in the two cattle populations from Mayotte and Madagascar, we identified sets of candidate genes involved in biological functions (cancer, skin structure, and UV-protection, nervous system and behavior, organ development, metabolism, and immune response) broadly representative of the physiological adaptation to tropical conditions. Overall, the origin of the cattle populations from Western Indian Ocean islands mirrors the complex history of human migrations and trade in this area.
Collapse
Affiliation(s)
- Jessica Magnier
- SELMET, University of Montpellier, CIRAD, INRAE, L’Institut Agro, Montpellier 34398, France
- CIRAD, UMR SELMET, Montpellier 34398, France
| | - Tom Druet
- Unit of Animal Genomics, GIGA-R, Faculty of Veterinary Medicine, University of Liège, Liège 4000, Belgium
| | | | | | | | - Jérôme Janelle
- SELMET, University of Montpellier, CIRAD, INRAE, L’Institut Agro, Montpellier 34398, France
- CIRAD, UMR SELMET, Saint-Pierre 97410, France
| | | | - Matthieu Lesnoff
- SELMET, University of Montpellier, CIRAD, INRAE, L’Institut Agro, Montpellier 34398, France
- CIRAD, UMR SELMET, Montpellier 34398, France
| | - Emmanuel Tillard
- SELMET, University of Montpellier, CIRAD, INRAE, L’Institut Agro, Montpellier 34398, France
- CIRAD, UMR SELMET, Saint-Pierre 97410, France
| | - Mathieu Gautier
- CBGP, INRAE, CIRAD, IRD, L’Institut Agro, University of Montpellier, Montferrier sur Lez 34988, France
| | - Laurence Flori
- SELMET, INRAE, CIRAD, L’Institut Agro, University of Montpellier, Montpellier 34398, France
| |
Collapse
|
30
|
Satou Y, Tokuoka M, Oda-Ishii I, Tokuhiro S, Ishida T, Liu B, Iwamura Y. A Manually Curated Gene Model Set for an Ascidian, Ciona robusta (Ciona intestinalis Type A). Zoolog Sci 2022; 39:253-260. [DOI: 10.2108/zs210102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 01/15/2022] [Indexed: 11/17/2022]
Affiliation(s)
- Yutaka Satou
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| | - Miki Tokuoka
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| | - Izumi Oda-Ishii
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| | - Sinichi Tokuhiro
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| | - Tasuku Ishida
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| | - Boqi Liu
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| | - Yuri Iwamura
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo, Kyoto 606-8502, Japan
| |
Collapse
|
31
|
Zhang Y, Zou D, Zhu T, Xu T, Chen M, Niu G, Zong W, Pan R, Jing W, Sang J, Liu C, Xiong Y, Sun Y, Zhai S, Chen H, Zhao W, Xiao J, Bao Y, Hao L, Zhang Z. Gene Expression Nebulas (GEN): a comprehensive data portal integrating transcriptomic profiles across multiple species at both bulk and single-cell levels. Nucleic Acids Res 2022; 50:D1016-D1024. [PMID: 34591957 PMCID: PMC8728231 DOI: 10.1093/nar/gkab878] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 09/15/2021] [Accepted: 09/17/2021] [Indexed: 01/07/2023] Open
Abstract
Transcriptomic profiling is critical to uncovering functional elements from transcriptional and post-transcriptional aspects. Here, we present Gene Expression Nebulas (GEN, https://ngdc.cncb.ac.cn/gen/), an open-access data portal integrating transcriptomic profiles under various biological contexts. GEN features a curated collection of high-quality bulk and single-cell RNA sequencing datasets by using standardized data processing pipelines and a structured curation model. Currently, GEN houses a large number of gene expression profiles from 323 datasets (157 bulk and 166 single-cell), covering 50 500 samples and 15 540 169 cells across 30 species, which are further categorized into six biological contexts. Moreover, GEN integrates a full range of transcriptomic profiles on expression, RNA editing and alternative splicing for 10 bulk datasets, providing opportunities for users to conduct integrative analysis at both transcriptional and post-transcriptional levels. In addition, GEN provides abundant gene annotations based on value-added curation of transcriptomic profiles and delivers online services for data analysis and visualization. Collectively, GEN presents a comprehensive collection of transcriptomic profiles across multiple species, thus serving as a fundamental resource for better understanding genetic regulatory architecture and functional mechanisms from tissues to cells.
Collapse
Affiliation(s)
- Yuansheng Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Tongtong Zhu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tianyi Xu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Ming Chen
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangyi Niu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenting Zong
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rong Pan
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Jing
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jian Sang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chang Liu
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yujia Xiong
- Beijing Neurosurgical Institute, Capital Medical University, Beijing 100069, China
| | - Yubin Sun
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Shuang Zhai
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Huanxin Chen
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Wenming Zhao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiming Bao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lili Hao
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Zhang Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
32
|
Masoudi-Sobhanzadeh Y, Esmaeili H, Masoudi-Nejad A. A fuzzy logic-based computational method for the repurposing of drugs against COVID-19. BIOIMPACTS : BI 2022; 12:315-324. [PMID: 35975205 PMCID: PMC9376160 DOI: 10.34172/bi.2021.40] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 03/27/2021] [Accepted: 04/03/2021] [Indexed: 01/09/2023]
Abstract
Introduction: COVID-19 has spread out all around the world and seriously interrupted human activities. Being a newfound disease, not only many aspects of the disease are unknown, but also there is not an effective medication to cure the disease. Besides, designing a drug is a time-consuming process and needs large investment. Hence, drug repurposing techniques, employed to discover the hidden benefits of the existing drugs, maybe a useful option for treating COVID-19. Methods: The present study exploits the drug repositioning concepts and introduces some candidate drugs which may be effective in controlling COVID-19. The suggested method consists of three main steps. First, the required data such as the amino acid sequences of targets and drug-target interactions are extracted from the public databases. Second, the similarity score between the targets (protein/enzymes) and genome of SARS-COV-2 is computed using the proposed fuzzy logic-based method. Since the classical approaches yield outcomes which may not be useful for the real-world applications, the fuzzy technique can address the issue. Third, after ranking targets based on the obtained scores, the usefulness of drugs affecting them is examined for managing COVID-19. Results: The results indicate that antiviral medicines, designed for curing hepatitis C, may also cure COVID-19. According to the findings, ribavirin, simeprevir, danoprevir, and XTL-6865 may be helpful in controlling the disease. Conclusion: It can be concluded that the similarity-based drug repurposing techniques may be the most suitable option for managing emerging diseases such as COVID-19 and can be applied to a wide range of data. Also, fuzzy logic-based scoring methods can produce outcomes which are more consistent with the real-world biological applications than others.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
,Corresponding authors: Ali Masoudi-Nejad, ; Yosef Masoudi-Sobhanzadeh,
| | - Hosein Esmaeili
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
,Corresponding authors: Ali Masoudi-Nejad, ; Yosef Masoudi-Sobhanzadeh,
| |
Collapse
|
33
|
Hu K, Liang P. Transcriptome Analysis Reveals Higher Levels of Mobile Element-Associated Abnormal Gene Transcripts in Temporal Lobe Epilepsy Patients. Front Genet 2021; 12:767341. [PMID: 34868252 PMCID: PMC8640520 DOI: 10.3389/fgene.2021.767341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 10/25/2021] [Indexed: 11/13/2022] Open
Abstract
Mesial temporal lobe epilepsy (MTLE) is the most common form of epilepsy, and temporal lobe epilepsy patients with hippocampal sclerosis (TLE-HS) show worse drug treatment effects and prognosis. TLE has been shown to have a genetic component, but its genetic research has been mostly limited to coding sequences of genes with known association to epilepsy. Representing a major component of the genome, mobile elements (MEs) are believed to contribute to the genetic etiology of epilepsy despite limited research. We analyzed publicly available human RNA-seq-based transcriptome data to determine the role of mobile elements in epilepsy by performing de novo transcriptome assembly, followed by identification of spliced gene transcripts containing mobile element (ME) sequences (ME-transcripts), to compare their frequency across different sample groups. Significantly higher levels of ME-transcripts in hippocampal tissues of epileptic patients, particularly in TLE-HS, were observed. Among ME classes, short interspersed nuclear elements (SINEs) were shown to be the most frequent contributor to ME-transcripts, followed by long interspersed nuclear elements (LINEs) and DNA transposons. These ME sequences almost in all cases represent older MEs normally located in the intron sequences. For protein coding genes, ME sequences were mostly found in the 3'-UTR regions, with a significant portion also in the coding sequences (CDSs), leading to reading frame disruption. Genes associated with ME-transcripts showed enrichment for the mRNA splicing process and an apparent bias in epileptic transcriptomes toward neural- and epilepsy-associated genes. The findings of this study suggest that abnormal splicing involving MEs, leading to loss of functions in critical genes, plays a role in epilepsy, particularly in TLE-HS, thus providing a novel insight into the molecular mechanisms underlying epileptogenesis.
Collapse
Affiliation(s)
- Kai Hu
- Department of Biological Sciences, Brock University, St. Catharines, ON, Canada.,Department of Neurology, Xiangya Hospital, Central South University, Changsha, China
| | - Ping Liang
- Department of Biological Sciences, Brock University, St. Catharines, ON, Canada
| |
Collapse
|
34
|
Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data. Nat Commun 2021; 12:6396. [PMID: 34737285 PMCID: PMC8569188 DOI: 10.1038/s41467-021-26698-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 10/20/2021] [Indexed: 11/09/2022] Open
Abstract
Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use.
Collapse
|
35
|
Rahimi K, Venø MT, Dupont DM, Kjems J. Nanopore sequencing of brain-derived full-length circRNAs reveals circRNA-specific exon usage, intron retention and microexons. Nat Commun 2021; 12:4825. [PMID: 34376658 PMCID: PMC8355340 DOI: 10.1038/s41467-021-24975-z] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 07/16/2021] [Indexed: 12/17/2022] Open
Abstract
Circular RNA (circRNA) is a class of covalently joined non-coding RNAs with functional roles in a wide variety of cellular processes. Their composition shows extensive overlap with exons found in linear mRNAs making it difficult to delineate their composition using short-read RNA sequencing, particularly for long and multi-exonic circRNAs. Here, we use long-read nanopore sequencing of nicked circRNAs (circNick-LRS) and characterize a total of 18,266 and 39,623 circRNAs in human and mouse brain, respectively. We further develop an approach for targeted long-read sequencing of a panel of circRNAs (circPanel-LRS), eliminating the need for prior circRNA enrichment and find >30 circRNA isoforms on average per targeted locus. Our data show that circRNAs exhibit a large number of splicing events such as novel exons, intron retention and microexons that preferentially occur in circRNAs. We propose that altered exon usage in circRNAs may reflect resistance to nonsense-mediated decay in the absence of translation.
Collapse
Affiliation(s)
- Karim Rahimi
- Department of Molecular Biology and Genetics (MBG), Aarhus University, Aarhus, Denmark.
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark.
| | - Morten T Venø
- Department of Molecular Biology and Genetics (MBG), Aarhus University, Aarhus, Denmark
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark
- Omiics ApS, Aarhus, Denmark
| | - Daniel M Dupont
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark
| | - Jørgen Kjems
- Department of Molecular Biology and Genetics (MBG), Aarhus University, Aarhus, Denmark.
- Interdisciplinary Nanoscience Center (iNANO), Aarhus University, Aarhus, Denmark.
| |
Collapse
|
36
|
Hong S, Lim YP, Kwon SY, Shin AY, Kim YM. Genome-Wide Comparative Analysis of Flowering-Time Genes; Insights on the Gene Family Expansion and Evolutionary Perspective. FRONTIERS IN PLANT SCIENCE 2021; 12:702243. [PMID: 34290729 PMCID: PMC8288248 DOI: 10.3389/fpls.2021.702243] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 05/03/2023]
Abstract
In polyploids, whole genome duplication (WGD) played a significant role in genome expansion, evolution and diversification. Many gene families are expanded following polyploidization, with the duplicated genes functionally diversified by neofunctionalization or subfunctionalization. These mechanisms may support adaptation and have likely contributed plant survival during evolution. Flowering time is an important trait in plants, which affects critical features, such as crop yields. The flowering-time gene family is one of the largest expanded gene families in plants, with its members playing various roles in plant development. Here, we performed genome-wide identification and comparative analysis of flowering-time genes in three palnt families i.e., Malvaceae, Brassicaceae, and Solanaceae, which indicate these genes were expanded following the event/s of polyploidization. Duplicated genes have been retained during evolution, although genome reorganization occurred in their flanking regions. Further investigation of sequence conservation and similarity network analyses provide evidence for functional diversification of duplicated genes during evolution. These functionally diversified genes play important roles in plant development and provide advantages to plants for adaptation and survival in response to environmental changes encountered during evolution. Collectively, we show that flowering-time genes were expanded following polyploidization and retained as large gene family by providing advantages from functional diversification during evolution.
Collapse
Affiliation(s)
- Seongmin Hong
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, South Korea
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Sciences, Chungnam National University, Daejeon, South Korea
| | - Yong Pyo Lim
- Molecular Genetics and Genomics Laboratory, Department of Horticulture, College of Agriculture and Life Sciences, Chungnam National University, Daejeon, South Korea
| | - Suk-Yoon Kwon
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, South Korea
| | - Ah-Young Shin
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, South Korea
| | - Yong-Min Kim
- Genome Editing Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, South Korea
| |
Collapse
|
37
|
Nanopore Sequencing and Hi-C Based De Novo Assembly of Trachidermus fasciatus Genome. Genes (Basel) 2021; 12:genes12050692. [PMID: 34066304 PMCID: PMC8148166 DOI: 10.3390/genes12050692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/25/2021] [Accepted: 04/27/2021] [Indexed: 11/17/2022] Open
Abstract
Trachidermus fasciatus is a roughskin sculpin fish widespread across the coastal areas of East Asia. Due to environmental destruction and overfishing, the population of this species is under threat. In order to protect this endangered species, it is important to have the genome sequenced. Reference genomes are essential for studying population genetics, domestic farming, and genetic resource protection. However, currently, no reference genome is available for Trachidermus fasciatus, and this has greatly hindered the research on this species. In this study, we integrated nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C methods to thoroughly assemble the Trachidermus fasciatus genome. Our results provided a chromosome-level high-quality genome assembly with a predicted genome size of 542.6 Mbp (2n = 40) and a scaffold N50 of 24.9 Mbp. The BUSCO value for genome assembly completeness was higher than 96%, and the single-base accuracy was 99.997%. Based on EVM-StringTie genome annotation, a total of 19,147 protein-coding genes were identified, including 35,093 mRNA transcripts. In addition, a novel gene-finding strategy named RNR was introduced, and in total, 51 (82) novel genes (transcripts) were identified. Lastly, we present here the first reference genome for Trachidermus fasciatus; this sequence is expected to greatly facilitate future research on this species.
Collapse
|
38
|
Aury JM, Istace B. Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads. NAR Genom Bioinform 2021; 3:lqab034. [PMID: 33987534 PMCID: PMC8092372 DOI: 10.1093/nargab/lqab034] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/18/2021] [Accepted: 04/13/2021] [Indexed: 12/11/2022] Open
Abstract
Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.
Collapse
Affiliation(s)
- Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057 Evry, France
| |
Collapse
|
39
|
Gomes-Dos-Santos A, Lopes-Lima M, Machado AM, Marcos Ramos A, Usié A, Bolotov IN, Vikhrev IV, Breton S, Castro LFC, da Fonseca RR, Geist J, Österling ME, Prié V, Teixeira A, Gan HM, Simakov O, Froufe E. The Crown Pearl: a draft genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758). DNA Res 2021; 28:6182681. [PMID: 33755103 PMCID: PMC8088596 DOI: 10.1093/dnares/dsab002] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 03/22/2021] [Indexed: 11/17/2022] Open
Abstract
Since historical times, the inherent human fascination with pearls turned the freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758) into a highly valuable cultural and economic resource. Although pearl harvesting in M. margaritifera is nowadays residual, other human threats have aggravated the species conservation status, especially in Europe. This mussel presents a myriad of rare biological features, e.g. high longevity coupled with low senescence and Doubly Uniparental Inheritance of mitochondrial DNA, for which the underlying molecular mechanisms are poorly known. Here, the first draft genome assembly of M. margaritifera was produced using a combination of Illumina Paired-end and Mate-pair approaches. The genome assembly was 2.4 Gb long, possessing 105,185 scaffolds and a scaffold N50 length of 288,726 bp. The ab initio gene prediction allowed the identification of 35,119 protein-coding genes. This genome represents an essential resource for studying this species’ unique biological and evolutionary features and ultimately will help to develop new tools to promote its conservation.
Collapse
Affiliation(s)
- André Gomes-Dos-Santos
- CIIMAR/CIMAR-Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, S/N, P 4450-208 Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Manuel Lopes-Lima
- CIIMAR/CIMAR-Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, S/N, P 4450-208 Matosinhos, Portugal.,CIBIO/InBIO-Research Center in Biodiversity and Genetic Resources, Universidade do Porto, Campus Agrário de Vairão, Rua Padre Armando Quintas, 4485-661 Vairão, Portugal.,IUCN SSC Mollusc Specialist Group, c/o IUCN, Cambridge, England
| | - André M Machado
- CIIMAR/CIMAR-Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, S/N, P 4450-208 Matosinhos, Portugal
| | - António Marcos Ramos
- Centro de Biotecnologia Agrícola e Agro-alimentar do Alentejo (CEBAL), Instituto Politécnico de Beja (IPBeja), 7801-908 Beja, Portugal.,MED-Mediterranean Institute for Agriculture, Environment and Development, CEBAL-Centro de Biotecnologia Agrícola e Agro-Alimentar do Alentejo, 7801-908 Beja, Portugal
| | - Ana Usié
- Centro de Biotecnologia Agrícola e Agro-alimentar do Alentejo (CEBAL), Instituto Politécnico de Beja (IPBeja), 7801-908 Beja, Portugal.,MED-Mediterranean Institute for Agriculture, Environment and Development, CEBAL-Centro de Biotecnologia Agrícola e Agro-Alimentar do Alentejo, 7801-908 Beja, Portugal
| | - Ivan N Bolotov
- Federal Center for Integrated Arctic Research, Russian Academy of Sciences, Arkhangelsk 163000, Russia
| | - Ilya V Vikhrev
- Federal Center for Integrated Arctic Research, Russian Academy of Sciences, Arkhangelsk 163000, Russia
| | - Sophie Breton
- Department of Biological Sciences, University of Montreal, Montreal, Canada
| | - L Filipe C Castro
- CIIMAR/CIMAR-Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, S/N, P 4450-208 Matosinhos, Portugal.,Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Rute R da Fonseca
- Center for Macroecology, Evolution and Climate, GLOBE Institute, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Juergen Geist
- Aquatic Systems Biology Unit, Technical University of Munich, TUM School of Life Sciences, D-85354 Freising, Germany
| | - Martin E Österling
- Department of Environmental and Life Sciences-Biology, Karlstad University, 651 88 Karlstad, Sweden
| | - Vincent Prié
- Research Associate, Institute of Systematics, Evolution, Biodiversity (ISYEB), National Museum of Natural History (MNHN), CNRS, SU, EPHE, 75005 Paris, France
| | - Amílcar Teixeira
- Centro de Investigação de Montanha (CIMO), Instituto Politécnico de Bragança, Bragança, Portugal
| | - Han Ming Gan
- GeneSEQ Sdn Bhd, Bandar Bukit Beruntung, Rawang 48300, Selangor, Malaysia
| | - Oleg Simakov
- Department of Neurosciences and Developmental Biology, University of Vienna, 1010 Vienna, Austria
| | - Elsa Froufe
- CIIMAR/CIMAR-Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos, S/N, P 4450-208 Matosinhos, Portugal
| |
Collapse
|
40
|
Zou Y, Zhu Y, Li Y, Wu FX, Wang J. Parallel computing for genome sequence processing. Brief Bioinform 2021; 22:6210355. [PMID: 33822883 DOI: 10.1093/bib/bbab070] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 01/26/2021] [Accepted: 02/10/2021] [Indexed: 01/08/2023] Open
Abstract
The rapid increase of genome data brought by gene sequencing technologies poses a massive challenge to data processing. To solve the problems caused by enormous data and complex computing requirements, researchers have proposed many methods and tools which can be divided into three types: big data storage, efficient algorithm design and parallel computing. The purpose of this review is to investigate popular parallel programming technologies for genome sequence processing. Three common parallel computing models are introduced according to their hardware architectures, and each of which is classified into two or three types and is further analyzed with their features. Then, the parallel computing for genome sequence processing is discussed with four common applications: genome sequence alignment, single nucleotide polymorphism calling, genome sequence preprocessing, and pattern detection and searching. For each kind of application, its background is firstly introduced, and then a list of tools or algorithms are summarized in the aspects of principle, hardware platform and computing efficiency. The programming model of each hardware and application provides a reference for researchers to choose high-performance computing tools. Finally, we discuss the limitations and future trends of parallel computing technologies.
Collapse
Affiliation(s)
- You Zou
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, China
| | - Yuejie Zhu
- Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, China
| | - Yaohang Li
- computer science at Old Dominion University, USA
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at the University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
41
|
Olazcuaga L, Loiseau A, Parrinello H, Paris M, Fraimout A, Guedot C, Diepenbrock LM, Kenis M, Zhang J, Chen X, Borowiec N, Facon B, Vogt H, Price DK, Vogel H, Prud'homme B, Estoup A, Gautier M. A Whole-Genome Scan for Association with Invasion Success in the Fruit Fly Drosophila suzukii Using Contrasts of Allele Frequencies Corrected for Population Structure. Mol Biol Evol 2021; 37:2369-2385. [PMID: 32302396 PMCID: PMC7403613 DOI: 10.1093/molbev/msaa098] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Evidence is accumulating that evolutionary changes are not only common during biological invasions but may also contribute directly to invasion success. The genomic basis of such changes is still largely unexplored. Yet, understanding the genomic response to invasion may help to predict the conditions under which invasiveness can be enhanced or suppressed. Here, we characterized the genome response of the spotted wing drosophila Drosophila suzukii during the worldwide invasion of this pest insect species, by conducting a genome-wide association study to identify genes involved in adaptive processes during invasion. Genomic data from 22 population samples were analyzed to detect genetic variants associated with the status (invasive versus native) of the sampled populations based on a newly developed statistic, we called C2, that contrasts allele frequencies corrected for population structure. We evaluated this new statistical framework using simulated data sets and implemented it in an upgraded version of the program BayPass. We identified a relatively small set of single-nucleotide polymorphisms that show a highly significant association with the invasive status of D. suzukii populations. In particular, two genes, RhoGEF64C and cpo, contained single-nucleotide polymorphisms significantly associated with the invasive status in the two separate main invasion routes of D. suzukii. Our methodological approaches can be applied to any other invasive species, and more generally to any evolutionary model for species characterized by nonequilibrium demographic conditions for which binary covariables of interest can be defined at the population level.
Collapse
Affiliation(s)
- Laure Olazcuaga
- INRAE, UMR CBGP (INRAE-IRD-Cirad - Montpellier SupAgro), Montferrier-sur-Lez, France
| | - Anne Loiseau
- INRAE, UMR CBGP (INRAE-IRD-Cirad - Montpellier SupAgro), Montferrier-sur-Lez, France
| | - Hugues Parrinello
- MGX, Biocampus Montpellier, CNRS, INSERM, Universite de Montpellier, Montpellier, France
| | | | - Antoine Fraimout
- INRAE, UMR CBGP (INRAE-IRD-Cirad - Montpellier SupAgro), Montferrier-sur-Lez, France
| | | | | | | | - Jinping Zhang
- MoA-CABI Joint Laboratory for Bio-Safety, Chinese Academy of Agricultural Sciences, BeiXiaGuan, Haidian Qu, China
| | - Xiao Chen
- College of Plant Protection, Yunnan Agricultural University, Kunming, Yunnan Province, China
| | - Nicolas Borowiec
- UMR INRAE-CNRS-Université Côte d'Azur Sophia Agrobiotech Institute, Sophia Antipolis, France
| | - Benoit Facon
- UMR Peuplements Végétaux et Bioagresseurs en Milieu Tropical, INRAE, Saint-Pierre, La Réunion, France
| | - Heidrun Vogt
- Julius Kühn-Institut (JKI), Federal Research Centre for Cultivated Plants, Institute for Plant Protection in Fruit Crops and Viticulture, Dossenheim, Germany
| | - Donald K Price
- School of Life Sciences, University of Nevada, Las Vegas, Las Vegas, NV
| | - Heiko Vogel
- Department of Entomology, Max Planck Institute for Chemical Ecology, Jena, Germany
| | | | - Arnaud Estoup
- INRAE, UMR CBGP (INRAE-IRD-Cirad - Montpellier SupAgro), Montferrier-sur-Lez, France
| | - Mathieu Gautier
- INRAE, UMR CBGP (INRAE-IRD-Cirad - Montpellier SupAgro), Montferrier-sur-Lez, France
| |
Collapse
|
42
|
Howe K, Chow W, Collins J, Pelan S, Pointon DL, Sims Y, Torrance J, Tracey A, Wood J. Significantly improving the quality of genome assemblies through curation. Gigascience 2021; 10:giaa153. [PMID: 33420778 PMCID: PMC7794651 DOI: 10.1093/gigascience/giaa153] [Citation(s) in RCA: 865] [Impact Index Per Article: 216.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 11/17/2020] [Accepted: 11/30/2020] [Indexed: 11/29/2022] Open
Abstract
Genome sequence assemblies provide the basis for our understanding of biology. Generating error-free assemblies is therefore the ultimate, but sadly still unachieved goal of a multitude of research projects. Despite the ever-advancing improvements in data generation, assembly algorithms and pipelines, no automated approach has so far reliably generated near error-free genome assemblies for eukaryotes. Whilst working towards improved datasets and fully automated pipelines, assembly evaluation and curation is actively used to bridge this shortcoming and significantly reduce the number of assembly errors. In addition to this increase in product value, the insights gained from assembly curation are fed back into the automated assembly strategy and contribute to notable improvements in genome assembly quality. We describe our tried and tested approach for assembly curation using gEVAL, the genome evaluation browser. We outline the procedures applied to genome curation using gEVAL and also our recommendations for assembly curation in a gEVAL-independent context to facilitate the uptake of genome curation in the wider community.
Collapse
Affiliation(s)
- Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Joanna Collins
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - Jonathan Wood
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| |
Collapse
|
43
|
Abstract
The advent of deep sequencing technologies has greatly improved the study of complex eukaryotic genomes and transcriptomes, allowing the investigation of posttranscriptional molecular mechanisms as alternative splicing and RNA editing at unprecedented throughput and resolution. The most prevalent type of RNA editing in higher eukaryotes is the deamination of adenosine to inosine (A-to-I) in double-stranded RNAs. Depending on the RNA type or the RNA region involved, A-to-I RNA editing contributes to the transcriptome and proteome diversity.Hereafter, we present an easy and reproducible computational protocol for the identification of candidate RNA editing sites in humans using deep transcriptome (RNA-Seq) and genome (DNA-Seq) sequencing.
Collapse
Affiliation(s)
- Claudio Lo Giudice
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari, Bari, Italy
| | - Ernesto Picardi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy.
- Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari, Bari, Italy.
| |
Collapse
|
44
|
De Novo A-to-I RNA Editing Discovery in lncRNA. Cancers (Basel) 2020; 12:cancers12102959. [PMID: 33066171 PMCID: PMC7650826 DOI: 10.3390/cancers12102959] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/18/2020] [Accepted: 10/09/2020] [Indexed: 12/11/2022] Open
Abstract
Simple Summary Long non-coding RNAs are emerging as key regulators of gene expression at both transcriptional and translational levels, and their alterations (in expression or sequence) are linked to tumorigenesis and tumor progression. RNA editing has the unique ability to change the RNA sequence without altering the integrity or sequence of genomic DNA, with adenosine to inosine (A-to-I) RNA editing being the most common event in humans. With the ability to change the genetic information after transcription, RNA editing is an essential player in the transcriptome and proteome enrichment; however, when deregulated, it can contribute to cell transformation. In this article, we performed the first deep de novo editing survey in lncRNA, demonstrating that RNA editing is a pervasive phenomenon involving lncRNAs important in the brain and brain cancer. Our study will open a new field of research in which the interplay between lncRNA and RNA editing can add novel insights into cancer. Abstract Background: Adenosine to inosine (A-to-I) RNA editing is the most frequent editing event in humans. It converts adenosine to inosine in double-stranded RNA regions (in coding and non-coding RNAs) through the action of the adenosine deaminase acting on RNA (ADAR) enzymes. Long non-coding RNAs, particularly abundant in the brain, account for a large fraction of the human transcriptome, and their important regulatory role is becoming progressively evident in both normal and transformed cells. Results: Herein, we present a bioinformatic analysis to generate a comprehensive inosinome picture in long non-coding RNAs (lncRNAs), using an ad hoc index and searching for de novo editing events in the normal brain cortex as well as in glioblastoma, a highly aggressive human brain cancer. We discovered >10,000 new sites and 335 novel lncRNAs that undergo editing, never reported before. We found a generalized downregulation of editing at multiple lncRNA sites in glioblastoma samples when compared to the normal brain cortex. Conclusion: Overall, our study discloses a novel layer of complexity that controls lncRNAs in the brain and brain cancer.
Collapse
|
45
|
Kadota M, Nishimura O, Miura H, Tanaka K, Hiratani I, Kuraku S. Multifaceted Hi-C benchmarking: what makes a difference in chromosome-scale genome scaffolding? Gigascience 2020; 9:5695848. [PMID: 31919520 PMCID: PMC6952475 DOI: 10.1093/gigascience/giz158] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 10/23/2019] [Accepted: 12/02/2019] [Indexed: 12/28/2022] Open
Abstract
Background Hi-C is derived from chromosome conformation capture (3C) and targets chromatin contacts on a genomic scale. This method has also been used frequently in scaffolding nucleotide sequences obtained by de novo genome sequencing and assembly, in which the number of resultant sequences rarely converges to the chromosome number. Despite its prevalent use, the sample preparation methods for Hi-C have not been intensively discussed, especially from the standpoint of genome scaffolding. Results To gain insight into the best practice of Hi-C scaffolding, we performed a multifaceted methodological comparison using vertebrate samples and optimized various factors during sample preparation, sequencing, and computation. As a result, we identified several key factors that helped improve Hi-C scaffolding, including the choice and preparation of tissues, library preparation conditions, the choice of restriction enzyme(s), and the choice of scaffolding program and its usage. Conclusions This study provides the first comparison of multiple sample preparation kits/protocols and computational programs for Hi-C scaffolding by an academic third party. We introduce a customized protocol designated “inexpensive and controllable Hi-C (iconHi-C) protocol,” which incorporates the optimal conditions identified in this study, and demonstrate this technique on chromosome-scale genome sequences of the Chinese softshell turtle Pelodiscus sinensis.
Collapse
Affiliation(s)
- Mitsutaka Kadota
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Osamu Nishimura
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Hisashi Miura
- Laboratory for Developmental Epigenetics, RIKEN BDR, Kobe 650-0047, Japan
| | - Kaori Tanaka
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| | - Ichiro Hiratani
- Laboratory for Developmental Epigenetics, RIKEN BDR, Kobe 650-0047, Japan
| | - Shigehiro Kuraku
- Laboratory for Phyloinformatics, RIKEN Center for Biosystems Dynamics Research (BDR), Kobe 650-0047, Japan
| |
Collapse
|
46
|
Franke KR, Crowgey EL. Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms. Genomics Inform 2020; 18:e10. [PMID: 32224843 PMCID: PMC7120354 DOI: 10.5808/gi.2020.18.1.e10] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 03/06/2020] [Indexed: 12/30/2022] Open
Abstract
Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.
Collapse
Affiliation(s)
- Karl R Franke
- Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USA
| | - Erin L Crowgey
- Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USA
| |
Collapse
|
47
|
Lo Giudice C, Tangaro MA, Pesole G, Picardi E. Investigating RNA editing in deep transcriptome datasets with REDItools and REDIportal. Nat Protoc 2020; 15:1098-1131. [PMID: 31996844 DOI: 10.1038/s41596-019-0279-7] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 12/05/2019] [Indexed: 12/14/2022]
Abstract
RNA editing is a widespread post-transcriptional mechanism able to modify transcripts through insertions/deletions or base substitutions. It is prominent in mammals, in which millions of adenosines are deaminated to inosines by members of the ADAR family of enzymes. A-to-I RNA editing has a plethora of biological functions, but its detection in large-scale transcriptome datasets is still an unsolved computational task. To this aim, we developed REDItools, the first software package devoted to the RNA editing profiling in RNA-sequencing (RNAseq) data. It has been successfully used in human transcriptomes, proving the tissue and cell type specificity of RNA editing as well as its pervasive nature. Outcomes from large-scale REDItools analyses on human RNAseq data have been collected in our specialized REDIportal database, containing more than 4.5 million events. Here we describe in detail two bioinformatic procedures based on our computational resources, REDItools and REDIportal. In the first procedure, we outline a workflow to detect RNA editing in the human cell line NA12878, for which transcriptome and whole genome data are available. In the second procedure, we show how to identify dysregulated editing at specific recoding sites in post-mortem brain samples of Huntington disease donors. On a 64-bit computer running Linux with ≥32 GB of random-access memory (RAM), both procedures should take ~76 h, using 4 to 24 cores. Our protocols have been designed to investigate RNA editing in different organisms with available transcriptomic and/or genomic reads. Scripts to complete both procedures and a docker image are available at https://github.com/BioinfoUNIBA/REDItools.
Collapse
Affiliation(s)
- Claudio Lo Giudice
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), National Research Council, Bari, Italy
| | - Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), National Research Council, Bari, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), National Research Council, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari, Bari, Italy.,National Institute of Biostructures and Biosystems (INBB), Rome, Italy
| | - Ernesto Picardi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), National Research Council, Bari, Italy. .,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari, Bari, Italy. .,National Institute of Biostructures and Biosystems (INBB), Rome, Italy.
| |
Collapse
|
48
|
HRCM: An Efficient Hybrid Referential Compression Method for Genomic Big Data. BIOMED RESEARCH INTERNATIONAL 2020; 2019:3108950. [PMID: 31915686 PMCID: PMC6930768 DOI: 10.1155/2019/3108950] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Revised: 09/14/2019] [Accepted: 10/22/2019] [Indexed: 12/22/2022]
Abstract
With the maturity of genome sequencing technology, huge amounts of sequence reads as well as assembled genomes are generating. With the explosive growth of genomic data, the storage and transmission of genomic data are facing enormous challenges. FASTA, as one of the main storage formats for genome sequences, is widely used in the Gene Bank because it eases sequence analysis and gene research and is easy to be read. Many compression methods for FASTA genome sequences have been proposed, but they still have room for improvement. For example, the compression ratio and speed are not so high and robust enough, and memory consumption is not ideal, etc. Therefore, it is of great significance to improve the efficiency, robustness, and practicability of genomic data compression to reduce the storage and transmission cost of genomic data further and promote the research and development of genomic technology. In this manuscript, a hybrid referential compression method (HRCM) for FASTA genome sequences is proposed. HRCM is a lossless compression method able to compress single sequence as well as large collections of sequences. It is implemented through three stages: sequence information extraction, sequence information matching, and sequence information encoding. A large number of experiments fully evaluated the performance of HRCM. Experimental verification shows that HRCM is superior to the best-known methods in genome batch compression. Moreover, HRCM memory consumption is relatively low and can be deployed on standard PCs.
Collapse
|
49
|
Clark MB, Wrzesinski T, Garcia AB, Hall NAL, Kleinman JE, Hyde T, Weinberger DR, Harrison PJ, Haerty W, Tunbridge EM. Long-read sequencing reveals the complex splicing profile of the psychiatric risk gene CACNA1C in human brain. Mol Psychiatry 2020; 25:37-47. [PMID: 31695164 PMCID: PMC6906184 DOI: 10.1038/s41380-019-0583-1] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 10/02/2019] [Accepted: 10/25/2019] [Indexed: 01/22/2023]
Abstract
RNA splicing is a key mechanism linking genetic variation with psychiatric disorders. Splicing profiles are particularly diverse in brain and difficult to accurately identify and quantify. We developed a new approach to address this challenge, combining long-range PCR and nanopore sequencing with a novel bioinformatics pipeline. We identify the full-length coding transcripts of CACNA1C in human brain. CACNA1C is a psychiatric risk gene that encodes the voltage-gated calcium channel CaV1.2. We show that CACNA1C's transcript profile is substantially more complex than appreciated, identifying 38 novel exons and 241 novel transcripts. Importantly, many of the novel variants are abundant, and predicted to encode channels with altered function. The splicing profile varies between brain regions, especially in cerebellum. We demonstrate that human transcript diversity (and thereby protein isoform diversity) remains under-characterised, and provide a feasible and cost-effective methodology to address this. A detailed understanding of isoform diversity will be essential for the translation of psychiatric genomic findings into pathophysiological insights and novel psychopharmacological targets.
Collapse
Affiliation(s)
- Michael B. Clark
- 0000 0004 1936 8948grid.4991.5Department of Psychiatry, University of Oxford, Oxford, UK ,0000 0001 2179 088Xgrid.1008.9Centre for Stem Cell Systems, Department of Anatomy and Neuroscience, The University of Melbourne, Melbourne, VIC Australia
| | | | - Aintzane B. Garcia
- 0000 0004 1936 8948grid.4991.5Department of Psychiatry, University of Oxford, Oxford, UK
| | - Nicola A. L. Hall
- 0000 0004 1936 8948grid.4991.5Department of Psychiatry, University of Oxford, Oxford, UK
| | - Joel E. Kleinman
- grid.429552.dThe Lieber Institute for Brain Development, Baltimore, MD USA
| | - Thomas Hyde
- grid.429552.dThe Lieber Institute for Brain Development, Baltimore, MD USA
| | | | - Paul J. Harrison
- 0000 0004 1936 8948grid.4991.5Department of Psychiatry, University of Oxford, Oxford, UK ,0000 0004 0573 576Xgrid.451190.8Oxford Health NHS Foundation Trust, Oxford, UK
| | | | - Elizabeth M. Tunbridge
- 0000 0004 1936 8948grid.4991.5Department of Psychiatry, University of Oxford, Oxford, UK ,0000 0004 0573 576Xgrid.451190.8Oxford Health NHS Foundation Trust, Oxford, UK
| |
Collapse
|
50
|
Robinson JA, Belsare S, Birnbaum S, Newman DE, Chan J, Glenn JP, Ferguson B, Cox LA, Wall JD. Analysis of 100 high-coverage genomes from a pedigreed captive baboon colony. Genome Res 2019; 29:848-856. [PMID: 30926611 PMCID: PMC6499309 DOI: 10.1101/gr.247122.118] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 03/21/2019] [Indexed: 12/21/2022]
Abstract
Baboons (genus Papio) are broadly studied in the wild and in captivity. They are widely used as a nonhuman primate model for biomedical studies, and the Southwest National Primate Research Center (SNPRC) at Texas Biomedical Research Institute has maintained a large captive baboon colony for more than 50 yr. Unlike other model organisms, however, the genomic resources for baboons are severely lacking. This has hindered the progress of studies using baboons as a model for basic biology or human disease. Here, we describe a data set of 100 high-coverage whole-genome sequences obtained from the mixed colony of olive (P. anubis) and yellow (P. cynocephalus) baboons housed at the SNPRC. These data provide a comprehensive catalog of common genetic variation in baboons, as well as a fine-scale genetic map. We show how the data can be used to learn about ancestry and admixture and to correct errors in the colony records. Finally, we investigated the consequences of inbreeding within the SNPRC colony and found clear evidence for increased rates of infant mortality and increased homozygosity of putatively deleterious alleles in inbred individuals.
Collapse
Affiliation(s)
- Jacqueline A Robinson
- Institute for Human Genetics, University of California, San Francisco, California 94143, USA
| | - Saurabh Belsare
- Institute for Human Genetics, University of California, San Francisco, California 94143, USA
| | - Shifra Birnbaum
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas 78245, USA
| | - Deborah E Newman
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas 78245, USA
| | - Jeannie Chan
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas 78245, USA
| | - Jeremy P Glenn
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas 78245, USA
| | - Betsy Ferguson
- Division of Genetics, Oregon National Primate Research Center, Beaverton, Oregon 97006, USA.,Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Laura A Cox
- Center for Precision Medicine, Department of Internal Medicine, Section of Molecular Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina 27101, USA.,Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, Texas 78245, USA
| | - Jeffrey D Wall
- Institute for Human Genetics, University of California, San Francisco, California 94143, USA
| |
Collapse
|