1
|
van der Toorn W, Bohn P, Liu-Wei W, Olguin-Nava M, Gribling-Burrer AS, Smyth RP, von Kleist M. Demultiplexing and barcode-specific adaptive sampling for nanopore direct RNA sequencing. Nat Commun 2025; 16:3742. [PMID: 40258808 PMCID: PMC12012114 DOI: 10.1038/s41467-025-59102-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Accepted: 04/10/2025] [Indexed: 04/23/2025] Open
Abstract
Nanopore direct RNA sequencing (dRNA-seq) enables unique insights into RNA biology. However, applications are currently limited by the lack of accurate and cost-effective sample multiplexing. Here we introduce WarpDemuX, an ultra-fast and highly accurate adapter-barcoding and demultiplexing approach for dRNA-seq with SQK-RNA002 and SQK-RNA004 chemistries. WarpDemuX enhances speed and accuracy by fast processing of the raw nanopore signal, use of a light-weight machine-learning algorithm and design of optimized barcode sets. We demonstrate its utility by performing rapid phenotypic profiling of different SARS-CoV-2 viruses through multiplexed sequencing of longitudinal samples on a single flowcell, identifying systematic differences in transcript abundance and poly(A) tail lengths during infection. Additionally, integrating WarpDemuX into sequencing control software enables real-time enrichment of target molecules through barcode-specific adaptive sampling, which we demonstrate by enriching low abundance viral RNA. In summary, WarpDemuX represents a broadly applicable, high-performance, economical multiplexing solution for dRNA-seq, facilitating advanced (epi-) transcriptomic research.
Collapse
Affiliation(s)
- Wiep van der Toorn
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Patrick Bohn
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
| | - Wang Liu-Wei
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- International Max-Planck Research School for Biology and Computing (IMPRS-BAC), Berlin, Germany
| | - Marco Olguin-Nava
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
| | - Anne-Sophie Gribling-Burrer
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
- Architecture et Réactivité de l'ARN, Institut de biologie moléculaire et cellulaire du CNRS, Université de Strasbourg, UPR 9002, Strasbourg, France
| | - Redmond P Smyth
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
- Architecture et Réactivité de l'ARN, Institut de biologie moléculaire et cellulaire du CNRS, Université de Strasbourg, UPR 9002, Strasbourg, France.
- Department of Medicine, University of Würzburg, Würzburg, Germany.
| | - Max von Kleist
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany.
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany.
| |
Collapse
|
2
|
Gillani M, Pollastri G. Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions. Proteins 2025; 93:745-759. [PMID: 39575640 PMCID: PMC11809130 DOI: 10.1002/prot.26767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/01/2024] [Accepted: 11/01/2024] [Indexed: 02/11/2025]
Abstract
Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer ScienceUniversity College Dublin (UCD)DublinIreland
| | | |
Collapse
|
3
|
Hu C, Liu G, Zhang Z, Pan Q, Zhang X, Liu W, Li Z, Li M, Zhu P, Ji T, Garber PA, Zhou X. Genetic linkage disequilibrium of deleterious mutations in threatened mammals. EMBO Rep 2024; 25:5620-5634. [PMID: 39487369 PMCID: PMC11624202 DOI: 10.1038/s44319-024-00307-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/09/2024] [Accepted: 10/16/2024] [Indexed: 11/04/2024] Open
Abstract
The impact of negative selection against deleterious mutations in endangered species remains underexplored. Recent studies have measured mutation load by comparing the accumulation of deleterious mutations, however, this method is most effective when comparing within and between populations of phylogenetically closely related species. Here, we introduced new statistics, LDcor, and its standardized form nLDcor, which allows us to detect and compare global linkage disequilibrium of deleterious mutations across species using unphased genotypes. These statistics measure averaged pairwise standardized covariance and standardize mutation differences based on the standard deviation of alleles to reflect selection intensity. We then examined selection strength in the genomes of seven mammals. Tigers exhibited an over-dispersion of deleterious mutations, while gorillas, giant pandas, and golden snub-nosed monkeys displayed negative linkage disequilibrium. Furthermore, the distribution of deleterious mutations in threatened mammals did not reveal consistent trends. Our results indicate that these newly developed statistics could help us understand the genetic burden of threatened species.
Collapse
Affiliation(s)
- Chunyan Hu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Gaoming Liu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
| | - Zhan Zhang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Qi Pan
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Xiaoxiao Zhang
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Weiqiang Liu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Zihao Li
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Meng Li
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
| | - Pingfen Zhu
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
| | - Ting Ji
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China
| | - Paul A Garber
- Department of Anthropology, Program in Ecology, Evolution, and Conservation Biology, University of Illinois, Urbana, IL, USA
- International Center of Boidiversity and Primate Conservation, Dali University, Dali, China
| | - Xuming Zhou
- Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, 100101, Beijing, China.
| |
Collapse
|
4
|
Stock W, Rousseau C, Dierickx G, D'hondt S, Amadei Martínez L, Dittami SM, van der Loos LM, De Clerck O. Breaking free from references: a consensus-based approach for community profiling with long amplicon nanopore data. Brief Bioinform 2024; 26:bbae642. [PMID: 39679438 DOI: 10.1093/bib/bbae642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 10/30/2024] [Accepted: 12/01/2024] [Indexed: 12/17/2024] Open
Abstract
Third-generation sequencing platforms, such as Oxford Nanopore Technology (ONT), have made it possible to characterize communities through the sequencing of long amplicons. While this theoretically allows for an increased taxonomic resolution compared to short-read sequencing platforms such as Illumina, the high error rate remains problematic for accurately identifying the community members present within a sample. Here, we present and validate CONCOMPRA, a tool that allows the detection of closely related strains within a community by drafting and mapping to consensus sequences. We show that CONCOMPRA outperforms several other tools for profiling bacterial communities using full-length 16S rRNA gene sequencing. Since CONCOMPRA does not rely on a sequence database for profiling communities, it is applicable to systems and amplicons for which little to no reference data exists. Our validation test shows that the amplification of long PCR products is likely to produce chimeric byproducts that inflate alpha diversity and skew community structure, stressing the importance of chimera detection. CONCOMPRA is available on GitHub (https://github.com/willem-stock/CONCOMPRA).
Collapse
Affiliation(s)
- Willem Stock
- Phycology Research Group, Ghent University, 9000 Gent, Belgium
| | - Coralie Rousseau
- Sorbonne University, CNRS, Laboratory of Integrative Biology of Marine Models (LBI2M, UMR 8227), Station Biologique de Roscoff (SBR), Roscoff, France
| | - Glen Dierickx
- Research Group Mycology, Ghent university, 9000 Gent, Belgium
- Research Unit Forest Ecology and Management, Research Institute for Nature and Forest, Geraardsbergen, Belgium
| | - Sofie D'hondt
- Phycology Research Group, Ghent University, 9000 Gent, Belgium
| | - Luz Amadei Martínez
- Laboratory of Protistology and Aquatic Ecology, Ghent University, 9000 Gent, Belgium
| | - Simon M Dittami
- Sorbonne University, CNRS, Laboratory of Integrative Biology of Marine Models (LBI2M, UMR 8227), Station Biologique de Roscoff (SBR), Roscoff, France
| | | | | |
Collapse
|
5
|
Akamatsu S, Mitsuhashi S, Soga K, Mizukami H, Shiraishi M, Frith MC, Yamano Y. Targeted nanopore sequencing using the Flongle device to identify mitochondrial DNA variants. Sci Rep 2024; 14:25161. [PMID: 39448697 PMCID: PMC11502840 DOI: 10.1038/s41598-024-75749-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 10/08/2024] [Indexed: 10/26/2024] Open
Abstract
Variants in mitochondrial genomes (mtDNA) can cause various neurological and mitochondrial diseases such as mitochondrial myopathy, encephalopathy, lactic acidosis, stroke-like episodes (MELAS). Given the 16 kb length of mtDNA, continuous sequencing is feasible using long-read sequencing (LRS). Herein, we aimed to show a simple and accessible method for comprehensive mtDNA sequencing with potential diagnostic applications for mitochondrial diseases using the compact and affordable LRS flow cell "Flongle." Whole mtDNA amplification (WMA) was performed using genomic DNA samples derived from four patients with mitochondrial diseases, followed by LRS using Flongle. We compared these results to those obtained using Cas9 enrichment. Additionally, the accuracy of heteroplasmy rates was assessed by incorporating mtDNA variants at equimolar levels. Finally, mtDNA from 19 patients with Parkinson's disease (PD) was sequenced using Flongle to identify disease risk-associated variants. mtDNA variants were detected in all four patients with mitochondrial diseases, with results comparable to those obtained from Cas9 enrichment. Heteroplasmy levels were accurately detected (r2 > 0.99) via WMA using Flongle. A reported variant was identified in three patients with PD. In conclusion, Flongle can simplify the traditionally cumbersome and expensive mtDNA sequencing process, offering a streamlined and accessible approach to diagnosing mitochondrial diseases.
Collapse
Affiliation(s)
- Shintaro Akamatsu
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, 2168511, Japan
| | - Satomi Mitsuhashi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, 2168511, Japan.
| | - Kaima Soga
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, 2168511, Japan
| | - Heisuke Mizukami
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, 2168511, Japan
| | - Makoto Shiraishi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, 2168511, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| | - Yoshihisa Yamano
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, 2168511, Japan.
- Department of Rare Diseases Research, Institute of Medical Science, St. Marianna University School of Medicine, Kawasaki, Japan.
| |
Collapse
|
6
|
Frith MC. A simple method for finding related sequences by adding probabilities of alternative alignments. Genome Res 2024; 34:1165-1173. [PMID: 39152037 PMCID: PMC11444175 DOI: 10.1101/gr.279464.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 08/14/2024] [Indexed: 08/19/2024]
Abstract
The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: Find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Described here is the simplest-conceivable change to standard sequence alignment, which sums probabilities of alternative alignments and makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, for example, DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan; Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8568, Japan; Computational Bio Big Data Open Innovation Laboratory, AIST, Tokyo 169-8555, Japan
| |
Collapse
|
7
|
Melton AE, Novak SJ, Buerki S. Utilizing a comparative approach to assess genome evolution during diploidization in Artemisia tridentata, a keystone species of western North America. AMERICAN JOURNAL OF BOTANY 2024; 111:e16353. [PMID: 38826031 DOI: 10.1002/ajb2.16353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 04/03/2024] [Accepted: 04/03/2024] [Indexed: 06/04/2024]
Abstract
PREMISE Polyploidization is often followed by diploidization. Diploidization is generally studied using synthetic polyploid lines and/or crop plants, but rarely using extant diploids or nonmodel plants such as Artemisia tridentata. This threatened western North American keystone species has a large genome compared to congeneric Artemisia species; dominated by diploid and tetraploid cytotypes, with multiple origins of tetraploids with genome size reduction. METHODS The genome of an A. tridentata sample was resequenced to study genome evolution and compared to that of A. annua, a diploid congener. Three diploid genomes of A. tridentata were compared to test for multiple diploidization events. RESULTS The A. tridentata genome had many chromosomal rearrangements relative to that of A. annua, while large-scale synteny of A. tridentata chromosome 3 and A. annua chromosome 4 was conserved. The three A. tridentata genomes had similar sizes (4.19-4.2 Gbp), heterozygosity (2.24-2.25%), and sequence (98.73-99.15% similarity) across scaffolds, and in k-mer analyses, similar patterns of diploid heterozygous k-mers (AB = 41%, 47%, and 47%), triploid heterozygous k-mers (AAB = 18-21%), and tetraploid k-mers (AABB = 13-17%). Biallelic SNPs were evenly distributed across scaffolds for all individuals. Comparisons of transposable element (TE) content revealed differential enrichment of TE clades. CONCLUSIONS Our findings suggest population-level TE differentiation after a shared polyploidization-to-diploidization event(s) and exemplify the complex processes of genome evolution. This research approached provides new resources for exploration of abiotic stress response, especially the roles of TEs in response pathways.
Collapse
Affiliation(s)
- Anthony E Melton
- Department of Biological Sciences, Boise State University, Boise, 83725, ID, USA
| | - Stephen J Novak
- Department of Biological Sciences, Boise State University, Boise, 83725, ID, USA
| | - Sven Buerki
- Department of Biological Sciences, Boise State University, Boise, 83725, ID, USA
| |
Collapse
|
8
|
Chen L, Ma J, Xu W, Shen F, Yang Z, Sonne C, Dietz R, Li L, Jie X, Li L, Yan G, Zhang X. Comparative transcriptome and methylome of polar bears, giant and red pandas reveal diet-driven adaptive evolution. Evol Appl 2024; 17:e13731. [PMID: 38894980 PMCID: PMC11183199 DOI: 10.1111/eva.13731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 05/18/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024] Open
Abstract
Epigenetic regulation plays an important role in the evolution of species adaptations, yet little information is available on the epigenetic mechanisms underlying the adaptive evolution of bamboo-eating in both giant pandas (Ailuropoda melanoleuca) and red pandas (Ailurus fulgens). To investigate the potential contribution of epigenetic to the adaptive evolution of bamboo-eating in giant and red pandas, we performed hepatic comparative transcriptome and methylome analyses between bamboo-eating pandas and carnivorous polar bears (Ursus maritimus). We found that genes involved in carbohydrate, lipid, amino acid, and protein metabolism showed significant differences in methylation and expression levels between the two panda species and polar bears. Clustering analysis of gene expression revealed that giant pandas did not form a sister group with the more closely related polar bears, suggesting that the expression pattern of genes in livers of giant pandas and red pandas have evolved convergently driven by their similar diets. Compared to polar bears, some key genes involved in carbohydrate metabolism and biological oxidation and cholesterol synthesis showed hypomethylation and higher expression in giant and red pandas, while genes involved in fat digestion and absorption, fatty acid metabolism, lysine degradation, resistance to lipid peroxidation and detoxification showed hypermethylation and low expression. Our study elucidates the special nutrient utilization mechanism of giant pandas and red pandas and provides some insights into the molecular mechanism of their adaptive evolution of bamboo feeding. This has important implications for the breeding and conservation of giant pandas and red pandas.
Collapse
Affiliation(s)
- Lei Chen
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
| | - Jinnan Ma
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
- College of Continuing EducationYunnan Normal UniversityKunmingChina
| | - Wencai Xu
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
| | - Fujun Shen
- Sichuan Key Laboratory for Conservation Biology of Endangered WildlifeChengdu Research Base of Giant Panda BreedingChengduChina
| | | | - Christian Sonne
- Arctic Research Centre, Faculty of Science and Technology, Department of EcoscienceAarhus UniversityRoskildeDenmark
| | - Rune Dietz
- Arctic Research Centre, Faculty of Science and Technology, Department of EcoscienceAarhus UniversityRoskildeDenmark
| | - Linzhu Li
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
| | - Xiaodie Jie
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
| | - Lu Li
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
| | - Guoqiang Yan
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
| | - Xiuyue Zhang
- Key Laboratory of bio‐Resources and eco‐Environment, Ministry of Education, College of Life ScienceSichuan UniversityChengduChina
- Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life SciencesSichuan UniversityChengduChina
| |
Collapse
|
9
|
Gribling-Burrer AS, Bohn P, Smyth RP. Isoform-specific RNA structure determination using Nano-DMS-MaP. Nat Protoc 2024; 19:1835-1865. [PMID: 38347203 DOI: 10.1038/s41596-024-00959-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 12/12/2023] [Indexed: 06/12/2024]
Abstract
RNA structure determination is essential to understand how RNA carries out its diverse biological functions. In cells, RNA isoforms are readily expressed with partial variations within their sequences due, for example, to alternative splicing, heterogeneity in the transcription start site, RNA processing or differential termination/polyadenylation. Nanopore dimethyl sulfate mutational profiling (Nano-DMS-MaP) is a method for in situ isoform-specific RNA structure determination. Unlike similar methods that rely on short sequencing reads, Nano-DMS-MaP employs nanopore sequencing to resolve the structures of long and highly similar RNA molecules to reveal their previously hidden structural differences. This Protocol describes the development and applications of Nano-DMS-MaP and outlines the main considerations for designing and implementing a successful experiment: from bench to data analysis. In cell probing experiments can be carried out by an experienced molecular biologist in 3-4 d. Data analysis requires good knowledge of command line tools and Python scripts and requires a further 3-5 d.
Collapse
Affiliation(s)
- Anne-Sophie Gribling-Burrer
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
| | - Patrick Bohn
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
| | - Redmond P Smyth
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany.
- Faculty of Medicine, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
10
|
Plessy C, Mansfield MJ, Bliznina A, Masunaga A, West C, Tan Y, Liu AW, Grašič J, Del Río Pisula MS, Sánchez-Serna G, Fabrega-Torrus M, Ferrández-Roldán A, Roncalli V, Navratilova P, Thompson EM, Onuma T, Nishida H, Cañestro C, Luscombe NM. Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species. Genome Res 2024; 34:426-440. [PMID: 38621828 PMCID: PMC11067885 DOI: 10.1101/gr.278295.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 02/28/2024] [Indexed: 04/17/2024]
Abstract
Genome structural variations within species are rare. How selective constraints preserve gene order and chromosome structure is a central question in evolutionary biology that remains unsolved. Our sequencing of several genomes of the appendicularian tunicate Oikopleura dioica around the globe reveals extreme genome scrambling caused by thousands of chromosomal rearrangements, although showing no obvious morphological differences between these animals. The breakpoint accumulation rate is an order of magnitude higher than in ascidian tunicates, nematodes, Drosophila, or mammals. Chromosome arms and sex-specific regions appear to be the primary unit of macrosynteny conservation. At the microsyntenic level, scrambling did not preserve operon structures, suggesting an absence of selective pressure to maintain them. The uncoupling of the genome scrambling with morphological conservation in O. dioica suggests the presence of previously unnoticed cryptic species and provides a new biological system that challenges our previous vision of speciation in which similar animals always share similar genome structures.
Collapse
Affiliation(s)
- Charles Plessy
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan;
| | - Michael J Mansfield
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Aleksandra Bliznina
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Aki Masunaga
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Charlotte West
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Yongkai Tan
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Andrew W Liu
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Jan Grašič
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - María Sara Del Río Pisula
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| | - Gaspar Sánchez-Serna
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Barcelona 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona 08028, Spain
| | - Marc Fabrega-Torrus
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Barcelona 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona 08028, Spain
| | - Alfonso Ferrández-Roldán
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Barcelona 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona 08028, Spain
| | - Vittoria Roncalli
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Barcelona 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona 08028, Spain
| | - Pavla Navratilova
- Centre of Plant Structural and Functional Genomics, Institute of Experimental Botany, 779 00 Olomouc, Czech Republic
- Sars International Centre, University of Bergen, Bergen N-5008, Norway
| | - Eric M Thompson
- Sars International Centre, University of Bergen, Bergen N-5008, Norway
- Department of Biological Sciences, University of Bergen, Bergen N-5020, Norway
| | - Takeshi Onuma
- Faculty of Science, Kagoshima University, Kagoshima 890-0065, Japan
- Department of Biological Sciences, Graduate School of Science, Osaka University, Toyonaka, Osaka 560-0043, Japan
| | - Hiroki Nishida
- Department of Biological Sciences, Graduate School of Science, Osaka University, Toyonaka, Osaka 560-0043, Japan
| | - Cristian Cañestro
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona (UB), Barcelona 08028, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona 08028, Spain
| | - Nicholas M Luscombe
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa 904-0495, Japan
| |
Collapse
|
11
|
Srivathsan A, Feng V, Suárez D, Emerson B, Meier R. ONTbarcoder 2.0: rapid species discovery and identification with real-time barcoding facilitated by Oxford Nanopore R10.4. Cladistics 2024; 40:192-203. [PMID: 38041646 DOI: 10.1111/cla.12566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/27/2023] [Accepted: 10/27/2023] [Indexed: 12/03/2023] Open
Abstract
Most arthropod species are undescribed and hidden in specimen-rich samples that are difficult to sort to species using morphological characters. For such samples, sorting to putative species with DNA barcodes is an attractive alternative, but needs cost-effective techniques that are suitable for use in many laboratories around the world. Barcoding using the portable and inexpensive MinION sequencer produced by Oxford Nanopore Technologies (ONT) could be useful for presorting specimen-rich samples with DNA barcodes because it requires little space and is inexpensive. However, similarly important is user-friendly and reliable software for analysis of the ONT data. It is here provided in the form of ONTbarcoder 2.0 that is suitable for all commonly used operating systems and includes a Graphical User Interface (GUI). Compared with an earlier version, ONTbarcoder 2.0 has three key improvements related to the higher read quality obtained with ONT's latest flow cells (R10.4), chemistry (V14 kits) and basecalling model (super-accuracy model). First, the improved read quality of ONT's latest flow cells (R10.4) allows for the use of primers with shorter indices than those previously needed (9 bp vs. 12-13 bp). This decreases the primer cost and can potentially improve PCR success rates. Second, ONTbarcoder now delivers real-time barcoding to complement ONT's real-time sequencing. This means that the first barcodes are obtained within minutes of starting a sequencing run; i.e. flow cell use can be optimized by terminating sequencing runs when most barcodes have already been obtained. The only input needed by ONTbarcoder 2.0 is a demultiplexing sheet and sequencing data (raw or basecalled) generated by either a Mk1B or a Mk1C. Thirdly, we demonstrate that the availability of R10.4 chemistry for the low-cost Flongle flow cell is an attractive option for users who require only 200-250 barcodes at a time.
Collapse
Affiliation(s)
- Amrita Srivathsan
- Center for Integrative Biodiversity Discovery, Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Invalidenstraße 43, 10115, Berlin, Germany
| | - Vivian Feng
- Center for Integrative Biodiversity Discovery, Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Invalidenstraße 43, 10115, Berlin, Germany
| | - Daniel Suárez
- Island Ecology and Evolution Research Group, Institute of Natural Products and Agrobiology (IPNA-CSIC), C/Astrofísico Francisco Sánchez 3, La Laguna, Tenerife, Canary Islands, 38206, Spain
- School of Doctoral and Postgraduate Studies, University of La Laguna, 38200 La Laguna, Tenerife, Canary Islands, 38200, Spain
| | - Brent Emerson
- Island Ecology and Evolution Research Group, Institute of Natural Products and Agrobiology (IPNA-CSIC), C/Astrofísico Francisco Sánchez 3, La Laguna, Tenerife, Canary Islands, 38206, Spain
| | - Rudolf Meier
- Center for Integrative Biodiversity Discovery, Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Invalidenstraße 43, 10115, Berlin, Germany
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117543, Singapore
| |
Collapse
|
12
|
Miller JR, Adjeroh DA. Machine learning on alignment features for parent-of-origin classification of simulated hybrid RNA-seq. BMC Bioinformatics 2024; 25:109. [PMID: 38475727 DOI: 10.1186/s12859-024-05728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
BACKGROUND Parent-of-origin allele-specific gene expression (ASE) can be detected in interspecies hybrids by virtue of RNA sequence variants between the parental haplotypes. ASE is detectable by differential expression analysis (DEA) applied to the counts of RNA-seq read pairs aligned to parental references, but aligners do not always choose the correct parental reference. RESULTS We used public data for species that are known to hybridize. We measured our ability to assign RNA-seq read pairs to their proper transcriptome or genome references. We tested software packages that assign each read pair to a reference position and found that they often favored the incorrect species reference. To address this problem, we introduce a post process that extracts alignment features and trains a random forest classifier to choose the better alignment. On each simulated hybrid dataset tested, our machine-learning post-processor achieved higher accuracy than the aligner by itself at choosing the correct parent-of-origin per RNA-seq read pair. CONCLUSIONS For the parent-of-origin classification of RNA-seq, machine learning can improve the accuracy of alignment-based methods. This approach could be useful for enhancing ASE detection in interspecies hybrids, though RNA-seq from real hybrids may present challenges not captured by our simulations. We believe this is the first application of machine learning to this problem domain.
Collapse
Affiliation(s)
- Jason R Miller
- Department of Computer Science, Mathematics, Engineering, Shepherd University, Shepherdstown, WV, USA.
- EVOGENE, Department of Biosciences, University of Oslo, Oslo, Norway.
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA.
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| |
Collapse
|
13
|
Tachikawa K, Shimizu T, Imai T, Ko R, Kawai Y, Omae Y, Tokunaga K, Frith MC, Yamano Y, Mitsuhashi S. Cost-Effective Cas9-Mediated Targeted Sequencing of Spinocerebellar Ataxia Repeat Expansions. J Mol Diagn 2024; 26:85-95. [PMID: 38008286 DOI: 10.1016/j.jmoldx.2023.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/28/2023] Open
Abstract
Hereditary repeat diseases are caused by an abnormal expansion of short tandem repeats in the genome. Among them, spinocerebellar ataxia (SCA) is a heterogeneous disease, and currently, 16 responsible repeats are known. Genetic diagnosis is obtained by analyzing the number of repeats through separate testing of each repeat. Although simultaneous detection of candidate repeats using current massively parallel sequencing technologies has been developed to avoid complicated multiple experiments, these methods are generally expensive. This study developed a cost-effective SCA repeat panel [Flongle SCA repeat panel sequencing (FLO-SCAp)] using Cas9-mediated targeted long-read sequencing and the smallest long-read sequencing apparatus, Flongle. This panel enabled the detection of repeat copy number changes, internal repeat sequences, and DNA methylation in seven patients with different repeat expansion diseases. The median (interquartile range) values of coverage and on-target rate were 39.5 (12 to 72) and 11.6% (7.5% to 16.5%), respectively. This approach was validated by comparing repeat copy number changes measured by FLO-SCAp and short-read whole-genome sequencing. A high correlation was observed between FLO-SCAp and short-read whole-genome sequencing when the repeat length was ≤250 bp (r = 0.98; P < 0.001). Thus, FLO-SCAp represents the most cost-effective method for conducting multiplex testing of repeats and can serve as the first-line diagnostic tool for SCA.
Collapse
Affiliation(s)
- Keiji Tachikawa
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Takahiro Shimizu
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Takeshi Imai
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Riyoko Ko
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Yosuke Kawai
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Yosuke Omae
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Katsushi Tokunaga
- Genome Medical Science Project, National Center for Global Health and Medicine, Tokyo, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan; Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan; Computational Bio Big-Data Open Innovation Laboratory, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Yoshihisa Yamano
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan; Department of Rare Diseases Research, Institute of Medical Science, St. Marianna University School of Medicine, Kawasaki, Japan
| | - Satomi Mitsuhashi
- Department of Neurology, St. Marianna University School of Medicine, Kawasaki, Japan.
| |
Collapse
|
14
|
Srivathsan A, Meier R. Scalable, Cost-Effective, and Decentralized DNA Barcoding with Oxford Nanopore Sequencing. Methods Mol Biol 2024; 2744:223-238. [PMID: 38683322 DOI: 10.1007/978-1-0716-3581-0_14] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
DNA barcodes are useful in biodiversity research, but sequencing barcodes with dye termination methods ("Sanger sequencing") has been so time-consuming and expensive that DNA barcodes are not as widely used as they should be. Fortunately, MinION sequencers from Oxford Nanopore Technologies have recently emerged as a cost-effective and efficient alternative for barcoding. MinION barcodes are now suitable for large-scale species discovery and enable specimen identification when the target species are represented in barcode databases. With a MinION, it is possible to obtain 10,000 barcodes from a single flow cell at a cost of less than 0.10 USD per specimen. Additionally, a Flongle flow cell can be used for small projects requiring up to 300 barcodes (0.50 USD per specimen). We here describe a cost-effective laboratory workflow for obtaining tagged amplicons, preparing ONT libraries, sequencing amplicon pools, and analyzing the MinION reads with the software ONTbarcoder. This workflow has been shown to yield highly accurate barcodes that are 99.99% identical to Sanger barcodes. Overall, we propose that the use of MinION for DNA barcoding is an attractive option for all researchers in need of a cost-effective and efficient solution for large-scale species discovery and specimen identification.
Collapse
Affiliation(s)
- Amrita Srivathsan
- Center for Integrative Biodiversity Discovery, Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Berlin, Germany
| | - Rudolf Meier
- Center for Integrative Biodiversity Discovery, Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Berlin, Germany.
- Institute for Biology, Humboldt University, Berlin, Germany.
| |
Collapse
|
15
|
Yusuf LH, Saldívar Lemus Y, Thorpe P, Macías Garcia C, Ritchie MG. Genomic Signatures Associated with Transitions to Viviparity in Cyprinodontiformes. Mol Biol Evol 2023; 40:msad208. [PMID: 37789509 PMCID: PMC10568250 DOI: 10.1093/molbev/msad208] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 08/23/2023] [Accepted: 09/19/2023] [Indexed: 10/05/2023] Open
Abstract
The transition from oviparity to viviparity has occurred independently over 150 times across vertebrates, presenting one of the most compelling cases of phenotypic convergence. However, whether the repeated, independent evolution of viviparity is driven by redeployment of similar genetic mechanisms and whether these leave a common signature in genomic divergence remains largely unknown. Although recent investigations into the evolution of viviparity have demonstrated striking similarity among the genes and molecular pathways involved across disparate vertebrate groups, quantitative tests for genome-wide convergent have provided ambivalent answers. Here, we investigate the potential role of molecular convergence during independent transitions to viviparity across an order of ray-finned freshwater fish (Cyprinodontiformes). We assembled de novo genomes and utilized publicly available genomes of viviparous and oviparous species to test for molecular convergence across both coding and noncoding regions. We found no evidence for an excess of molecular convergence in amino acid substitutions and in rates of sequence divergence, implying independent genetic changes are associated with these transitions. However, both statistical power and biological confounds could constrain our ability to detect significant correlated evolution. We therefore identified candidate genes with potential signatures of molecular convergence in viviparous Cyprinodontiformes lineages. Motif enrichment and gene ontology analyses suggest transcriptional changes associated with early morphogenesis, brain development, and immunity occurred alongside the evolution of viviparity. Overall, however, our findings indicate that independent transitions to viviparity in these fish are not strongly associated with an excess of molecular convergence, but a few genes show convincing evidence of convergent evolution.
Collapse
Affiliation(s)
- Leeban H Yusuf
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, UK
| | - Yolitzi Saldívar Lemus
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, UK
- Department of Biology, Texas State University, San Marcos, TX, USA
| | - Peter Thorpe
- The Data Analysis Group, School of Life Sciences, University of Dundee, Dundee, UK
- School of Medicine, University of North Haugh, St Andrews KY16 9TF, UK
| | - Constantino Macías Garcia
- Instituto de Ecologia, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico City CdMx, Mexico
| | - Michael G Ritchie
- Centre for Biological Diversity, School of Biology, University of St Andrews, St Andrews, UK
| |
Collapse
|
16
|
Bertram H, Wilhelmi S, Rajavel A, Boelhauve M, Wittmann M, Ramzan F, Schmitt AO, Gültas M. Comparative Investigation of Coincident Single Nucleotide Polymorphisms Underlying Avian Influenza Viruses in Chickens and Ducks. BIOLOGY 2023; 12:969. [PMID: 37508399 PMCID: PMC10375970 DOI: 10.3390/biology12070969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023]
Abstract
Avian influenza is a severe viral infection that has the potential to cause human pandemics. In particular, chickens are susceptible to many highly pathogenic strains of the virus, resulting in significant losses. In contrast, ducks have been reported to exhibit rapid and effective innate immune responses to most avian influenza virus (AIV) infections. To explore the distinct genetic programs that potentially distinguish the susceptibility/resistance of both species to AIV, the investigation of coincident SNPs (coSNPs) and their differing causal effects on gene functions in both species is important to gain novel insight into the varying immune-related responses of chickens and ducks. By conducting a pairwise genome alignment between these species, we identified coSNPs and their respective effect on AIV-related differentially expressed genes (DEGs) in this study. The examination of these genes (e.g., CD74, RUBCN, and SHTN1 for chickens and ABCA3, MAP2K6, and VIPR2 for ducks) reveals their high relevance to AIV. Further analysis of these genes provides promising effector molecules (such as IκBα, STAT1/STAT3, GSK-3β, or p53) and related key signaling pathways (such as NF-κB, JAK/STAT, or Wnt) to elucidate the complex mechanisms of immune responses to AIV infections in both chickens and ducks.
Collapse
Affiliation(s)
- Hendrik Bertram
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
| | - Selina Wilhelmi
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Abirami Rajavel
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Marc Boelhauve
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
| | - Margareta Wittmann
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
| | - Faisal Ramzan
- Institute of Animal and Dairy Sciences, University of Agriculture, Faisalabad 38000, Pakistan
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| | - Mehmet Gültas
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany; (H.B.)
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075 Göttingen, Germany
| |
Collapse
|
17
|
Cosi I, Moccia A, Pescucci C, Munagala U, Di Giorgio S, Sineo I, Conticello SG, Notaro R, De Angioletti M. Identification and characterization of novel ETV4 splice variants in prostate cancer. Sci Rep 2023; 13:5267. [PMID: 37002241 PMCID: PMC10066307 DOI: 10.1038/s41598-023-29484-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/06/2023] [Indexed: 04/03/2023] Open
Abstract
ETV4, one of ETS proteins overexpressed in prostate cancer, promotes migration, invasion, and proliferation in prostate cells. This study identifies a series of previously unknown ETV4 alternatively spliced transcripts in human prostate cell lines. Their expression has been validated using several unbiased techniques, including Nanopore sequencing. Most of these transcripts originate from an in-frame exon skipping and, thus, are expected to be translated into ETV4 protein isoforms. Functional analysis of the most abundant among these isoforms shows that they still bear an activity, namely a reduced ability to promote proliferation and a residual ability to regulate the transcription of ETV4 target genes. Alternatively spliced genes are common in cancer cells: an analysis of the TCGA dataset confirms the abundance of these novel ETV4 transcripts in prostate tumors, in contrast to peritumoral tissues. Since none of their translated isoforms have acquired a higher oncogenic potential, such abundance is likely to reflect the tumor deranged splicing machinery. However, it is also possible that their interaction with the canonical variants may contribute to the biology and the clinics of prostate cancer. Further investigations are needed to elucidate the biological role of these ETV4 transcripts and of their putative isoforms.
Collapse
Affiliation(s)
- Irene Cosi
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
- ICCOM - National Research Council, Sesto Fiorentino, Florence, Italy
| | - Annalisa Moccia
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
| | - Chiara Pescucci
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
| | - Uday Munagala
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
| | - Salvatore Di Giorgio
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
| | - Irene Sineo
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
| | - Silvestro G Conticello
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
- IFC - National Research Council, Pisa, Italy
| | - Rosario Notaro
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy
- IFC - National Research Council, Pisa, Italy
| | - Maria De Angioletti
- Core Research Laboratory, Istituto per lo Studio, la Prevenzione e la Rete Oncologica (ISPRO), Viale Pieraccini 6, 50139, Florence, Italy.
- ICCOM - National Research Council, Sesto Fiorentino, Florence, Italy.
| |
Collapse
|
18
|
Abstract
Abnormal expansion or shortening of tandem repeats can cause a variety of genetic diseases. The use of long DNA reads has facilitated the analysis of disease-causing repeats in the human genome. Long read sequencers enable us to directly analyze repeat length and sequence content by covering whole repeats; they are therefore considered suitable for the analysis of long tandem repeats. Here, we describe an expanded repeat analysis using target sequencing data produced by the Oxford Nanopore Technologies (hereafter referred to as ONT) nanopore sequencer.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan.
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Kanagawa, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan
| |
Collapse
|
19
|
Frith MC, Mitsuhashi S. Finding Rearrangements in Nanopore DNA Reads with LAST and dnarrange. Methods Mol Biol 2023; 2632:161-175. [PMID: 36781728 DOI: 10.1007/978-1-0716-2996-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023]
Abstract
Long-read DNA sequencing techniques such as nanopore are especially useful for characterizing complex sequence rearrangements, which occur in some genetic diseases and also during evolution. Analyzing the sequence data to understand such rearrangements is not trivial, due to sequencing error, rearrangement intricacy, and abundance of repeated similar sequences in genomes.The LAST and dnarrange software packages can resolve complex relationships between DNA sequences and characterize changes such as gene conversion, processed pseudogene insertion, and chromosome shattering. They can filter out numerous rearrangements shared by controls, e.g., healthy humans versus a patient, to focus on rearrangements unique to the patient. One useful ingredient is last-train, which learns the rates (probabilities) of deletions, insertions, and each kind of base match and mismatch. These probabilities are then used to find the most likely sequence relationships/alignments, which is especially useful for DNA with unusual rates, such as DNA from Plasmodium falciparum (malaria) with ∼80% a+t. This is also useful for less-studied species that lack reference genomes, so the DNA reads are compared to a different species' genome. We also point out that a reference genome with ancestral alleles would be ideal.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo, Japan.
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
- Computational Bio Big-Data Open Innovation Laboratory, AIST, Tokyo, Japan.
| | - Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Neurology, Department of Internal Medicine, St. Marianna University School of Medicine, Kawasaki, Japan
| |
Collapse
|
20
|
Zhao Q, Shao F, Li Y, Yi SV, Peng Z. Novel genome sequence of Chinese cavefish (Triplophysa rosa) reveals pervasive relaxation of natural selection in cavefish genomes. Mol Ecol 2022; 31:5831-5845. [PMID: 36125323 PMCID: PMC9828065 DOI: 10.1111/mec.16700] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 09/15/2022] [Indexed: 01/13/2023]
Abstract
All cavefishes, living exclusively in caves across the globe, exhibit similar phenotypic traits, including the characteristic loss of eyes. To understand whether such phenotypic convergence shares similar genomic bases, here we investigated genome-wide evolutionary signatures of cavefish phenotypes by comparing whole-genome sequences of three pairs of cavefishes and their surface fish relatives. Notably, we newly sequenced and generated a whole-genome assembly of the Chinese cavefish Triplophysa rosa. Our comparative analyses revealed several shared features of cavefish genome evolution. Cavefishes had lower mutation rates than their surface fish relatives. In contrast, the ratio of nonsynonymous to synonymous substitutions (ω) was significantly elevated in cavefishes compared to in surface fishes, consistent with the relaxation of purifying selection. In addition, cavefish genomes had an increased mutational load, including mutations that alter protein hydrophobicity profiles, which were considered harmful. Interestingly, however, we found no overlap in positively selected genes among different cavefish lineages, indicating that the phenotypic convergence in cavefishes was not caused by positive selection of the same sets of genes. Analyses of previously identified candidate genes associated with cave phenotypes supported this conclusion. Genes belonging to the lipid metabolism functional ontology were under relaxed purifying selection in all cavefish genomes, which may be associated with the nutrient-poor habitat of cavefishes. Our work reveals previously uncharacterized patterns of cavefish genome evolution and provides comparative insights into the evolution of cave-associated phenotypic traits.
Collapse
Affiliation(s)
- Qingyuan Zhao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education)Southwest University School of Life SciencesChongqingChina,Department of Laboratory Animal Science, College of Basic Medical SciencesArmy Medical University (Third Military Medical University)ChongqingChina
| | - Feng Shao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education)Southwest University School of Life SciencesChongqingChina
| | - Yanping Li
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education)Southwest University School of Life SciencesChongqingChina,Key Laboratory of Sichuan Province for Fish Conservation and Utilization in the Upper Reaches of the Yangtze RiverNeijiang Normal University College of Life SciencesNeijiangChina
| | - Soojin V. Yi
- Department of Ecology, Evolution and Marine BiologyUniversity of CaliforniaSanta BarbaraCaliforniaUSA
| | - Zuogang Peng
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education)Southwest University School of Life SciencesChongqingChina,Academy of Plateau Science and SustainabilityQinghai Normal UniversityXiningChina
| |
Collapse
|
21
|
Teo WW, Cao X, Wu CS, Tan HK, Zhou Q, Gao C, Vanuytsel K, Kumar SS, Murphy GJ, Yang H, Chai L, Tenen DG. Non-coding RNA LEVER sequestration of PRC2 can mediate long range gene regulation. Commun Biol 2022; 5:343. [PMID: 35411071 PMCID: PMC9001699 DOI: 10.1038/s42003-022-03250-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 03/09/2022] [Indexed: 11/20/2022] Open
Abstract
Polycomb Repressive Complex 2 (PRC2) is an epigenetic regulator required for gene silencing during development. Although PRC2 is a well-established RNA-binding complex, the biological function of PRC2-RNA interaction has been controversial. Here, we study the gene-regulatory role of the inhibitory PRC2-RNA interactions. We report a nuclear long non-coding RNA, LEVER, which mapped 236 kb upstream of the β-globin cluster as confirmed by Nanopore sequencing. LEVER RNA interacts with PRC2 in its nascent form, and this prevents the accumulation of the H3K27 repressive histone marks within LEVER locus. Interestingly, the accessible LEVER chromatin, in turn, suppresses the chromatin interactions between the ε-globin locus and β-globin locus control region (LCR), resulting in a repressive effect on ε-globin gene expression. Our findings validate that the nascent RNA-PRC2 interaction inhibits local PRC2 function in situ. More importantly, we demonstrate that such a local process can in turn regulate the expression of neighboring genes. Identification of a long non-coding RNA LEVER, that inhibits the Polycomb Repressive Complex 2 (PRC2) and controls nearby embryonic form of beta-globin gene, provides additional evidence for PRC2-RNA functional interaction.
Collapse
Affiliation(s)
- Wei Wen Teo
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Xinang Cao
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.,Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Chan-Shuo Wu
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Hong Kee Tan
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.,National University of Singapore, Graduate School for Integrative Sciences and Engineering, Singapore, Singapore
| | - Qiling Zhou
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Chong Gao
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Kim Vanuytsel
- Section of Hematology and Medical Oncology, School of Medicine, Boston University, Boston, MA, USA.,Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, USA
| | - Sara S Kumar
- Section of Hematology and Medical Oncology, School of Medicine, Boston University, Boston, MA, USA.,Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, USA
| | - George J Murphy
- Section of Hematology and Medical Oncology, School of Medicine, Boston University, Boston, MA, USA.,Center for Regenerative Medicine, Boston University and Boston Medical Center, Boston, MA, USA
| | - Henry Yang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Li Chai
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.
| | - Daniel G Tenen
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore. .,Harvard Stem Cell Institute, Harvard Medical School, Boston, MA, USA. .,Harvard Initiative for RNA Medicine, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
22
|
Shrestha AMS, B Guiao JE, R Santiago KC. Assembly-free rapid differential gene expression analysis in non-model organisms using DNA-protein alignment. BMC Genomics 2022; 23:97. [PMID: 35120462 PMCID: PMC8815227 DOI: 10.1186/s12864-021-08278-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 12/22/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND RNA-seq is being increasingly adopted for gene expression studies in a panoply of non-model organisms, with applications spanning the fields of agriculture, aquaculture, ecology, and environment. For organisms that lack a well-annotated reference genome or transcriptome, a conventional RNA-seq data analysis workflow requires constructing a de-novo transcriptome assembly and annotating it against a high-confidence protein database. The assembly serves as a reference for read mapping, and the annotation is necessary for functional analysis of genes found to be differentially expressed. However, assembly is computationally expensive. It is also prone to errors that impact expression analysis, especially since sequencing depth is typically much lower for expression studies than for transcript discovery. RESULTS We propose a shortcut, in which we obtain counts for differential expression analysis by directly aligning RNA-seq reads to the high-confidence proteome that would have been otherwise used for annotation. By avoiding assembly, we drastically cut down computational costs - the running time on a typical dataset improves from the order of tens of hours to under half an hour, and the memory requirement is reduced from the order of tens of Gbytes to tens of Mbytes. We show through experiments on simulated and real data that our pipeline not only reduces computational costs, but has higher sensitivity and precision than a typical assembly-based pipeline. A Snakemake implementation of our workflow is available at: https://bitbucket.org/project_samar/samar . CONCLUSIONS The flip side of RNA-seq becoming accessible to even modestly resourced labs has been that the time, labor, and infrastructure cost of bioinformatics analysis has become a bottleneck. Assembly is one such resource-hungry process, and we show here that it can be avoided for quick and easy, yet more sensitive and precise, differential gene expression analysis in non-model organisms.
Collapse
Affiliation(s)
- Anish M S Shrestha
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines.
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines.
| | - Joyce Emlyn B Guiao
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines
- Department of Mathematics and Statistics, College of Science, De La Salle University, Manila, Philippines
| | - Kyle Christian R Santiago
- Bioinformatics Lab, Advanced Research Institute for Informatics, Computing, and Networking (AdRIC), De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| |
Collapse
|
23
|
van der Graaf-van Bloois L, Wagenaar JA, Zomer AL. RFPlasmid: predicting plasmid sequences from short-read assembly data using machine learning. Microb Genom 2021; 7. [PMID: 34846288 PMCID: PMC8743549 DOI: 10.1099/mgen.0.000683] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Antimicrobial-resistance (AMR) genes in bacteria are often carried on plasmids and these plasmids can transfer AMR genes between bacteria. For molecular epidemiology purposes and risk assessment, it is important to know whether the genes are located on highly transferable plasmids or in the more stable chromosomes. However, draft whole-genome sequences are fragmented, making it difficult to discriminate plasmid and chromosomal contigs. Current methods that predict plasmid sequences from draft genome sequences rely on single features, like k-mer composition, circularity of the DNA molecule, copy number or sequence identity to plasmid replication genes, all of which have their drawbacks, especially when faced with large single-copy plasmids, which often carry resistance genes. With our newly developed prediction tool RFPlasmid, we use a combination of multiple features, including k-mer composition and databases with plasmid and chromosomal marker proteins, to predict whether the likely source of a contig is plasmid or chromosomal. The tool RFPlasmid supports models for 17 different bacterial taxa, including Campylobacter, Escherichia coli and Salmonella, and has a taxon agnostic model for metagenomic assemblies or unsupported organisms. RFPlasmid is available both as a standalone tool and via a web interface.
Collapse
Affiliation(s)
- Linda van der Graaf-van Bloois
- Faculty of Veterinary Medicine, Department of Infectious Diseases and Immunology, Utrecht University, Utrecht, The Netherlands.,WHO Collaborating Centre for Reference and Research on Campylobacter and Antimicrobial Resistance from an One Health Perspective/OIE Reference Laboratory for Campylobacteriosis, Utrecht, The Netherlands
| | - Jaap A Wagenaar
- Faculty of Veterinary Medicine, Department of Infectious Diseases and Immunology, Utrecht University, Utrecht, The Netherlands.,WHO Collaborating Centre for Reference and Research on Campylobacter and Antimicrobial Resistance from an One Health Perspective/OIE Reference Laboratory for Campylobacteriosis, Utrecht, The Netherlands.,Wageningen Bioveterinary Research, Lelystad, The Netherlands
| | - Aldert L Zomer
- Faculty of Veterinary Medicine, Department of Infectious Diseases and Immunology, Utrecht University, Utrecht, The Netherlands.,WHO Collaborating Centre for Reference and Research on Campylobacter and Antimicrobial Resistance from an One Health Perspective/OIE Reference Laboratory for Campylobacteriosis, Utrecht, The Netherlands
| |
Collapse
|
24
|
Kiguchi Y, Nishijima S, Kumar N, Hattori M, Suda W. Long-read metagenomics of multiple displacement amplified DNA of low-biomass human gut phageomes by SACRA pre-processing chimeric reads. DNA Res 2021; 28:6377780. [PMID: 34586399 DOI: 10.1093/dnares/dsab019] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Indexed: 01/21/2023] Open
Abstract
The human gut bacteriophage community (phageome) plays an important role in the host's health and disease; however, the entire structure is poorly understood, partly owing to the generation of many incomplete genomes in conventional short-read metagenomics. Here, we show long-read metagenomics of amplified DNA of low-biomass phageomes with multiple displacement amplification (MDA), involving the development of a novel bioinformatics tool, split amplified chimeric read algorithm (SACRA), that efficiently pre-processed numerous chimeric reads generated through MDA. Using five samples, SACRA markedly reduced the average chimera ratio from 72% to 1.5% in PacBio reads with an average length of 1.8 kb. De novo assembly of chimera-less PacBio long reads reconstructed contigs of ≥5 kb with an average proportion of 27%, which was 1% in contigs from MiSeq short reads, thereby dramatically improving contig length and genome completeness. Comparison of PacBio and MiSeq contigs found MiSeq contig fragmentations frequently near local repeats and hypervariable regions in the phage genomes, and those caused by multiple homologous phage genomes coexisting in the community. We also developed a reference-independent method to assess the completeness of the linear phage genomes. Overall, we established a SACRA-coupled long-read metagenomics robust to highly diverse gut phageomes, identifying high-quality circular and linear phage genomes with adequate sequence quantity.
Collapse
Affiliation(s)
- Yuya Kiguchi
- Cooperative Major in Advanced Health Science, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo 169-8555, Japan
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Suguru Nishijima
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo 169-8555, Japan
- Integrated Institute for Regulatory Science, Waseda University, Tokyo 169-8555, Japan
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Naveen Kumar
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Masahira Hattori
- Cooperative Major in Advanced Health Science, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Wataru Suda
- Laboratory for Microbiome Sciences, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| |
Collapse
|
25
|
Srivathsan A, Lee L, Katoh K, Hartop E, Kutty SN, Wong J, Yeo D, Meier R. ONTbarcoder and MinION barcodes aid biodiversity discovery and identification by everyone, for everyone. BMC Biol 2021; 19:217. [PMID: 34587965 PMCID: PMC8479912 DOI: 10.1186/s12915-021-01141-x] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/03/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via "innovation through subtraction" and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer. RESULTS We describe how tagged amplicons can be obtained and sequenced with the real-time MinION sequencer in many settings (field stations, biodiversity labs, citizen science labs, schools). We also provide amplicon coverage recommendations that are based on several runs of the latest generation of MinION flow cells ("R10.3") which suggest that each run can generate barcodes for > 10,000 specimens. Next, we present a novel software, ONTbarcoder, which overcomes the bioinformatics challenges posed by MinION reads. The software is compatible with Windows 10, Macintosh, and Linux, has a graphical user interface (GUI), and can generate thousands of barcodes on a standard laptop within hours based on only two input files (FASTQ, demultiplexing file). We document that MinION barcodes are virtually identical to Sanger and Illumina barcodes for the same specimens (> 99.99%) and provide evidence that MinION flow cells and reads have improved rapidly since 2018. CONCLUSIONS We propose that barcoding with MinION is the way forward for government agencies, universities, museums, and schools because it combines low consumable and capital cost with scalability. Small projects can use the flow cell dongle ("Flongle") while large projects can rely on MinION flow cells that can be stopped and re-used after collecting sufficient data for a given project.
Collapse
Affiliation(s)
- Amrita Srivathsan
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Leshon Lee
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Kazutaka Katoh
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
- Artificial Intelligence Research Center, AIST, Tokyo, Japan
| | - Emily Hartop
- Zoology Department, Stockholms Universitet, Stockholm, Sweden
- Station Linné, Öland, Sweden
| | - Sujatha Narayanan Kutty
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Tropical Marine Science Institute, National University of Singapore, Singapore, Singapore
| | - Johnathan Wong
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Darren Yeo
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Rudolf Meier
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore.
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Center for Integrative Biodiversity Discovery, Berlin, Germany.
| |
Collapse
|
26
|
Hufford MB, Seetharam AS, Woodhouse MR, Chougule KM, Ou S, Liu J, Ricci WA, Guo T, Olson A, Qiu Y, Della Coletta R, Tittes S, Hudson AI, Marand AP, Wei S, Lu Z, Wang B, Tello-Ruiz MK, Piri RD, Wang N, Kim DW, Zeng Y, O'Connor CH, Li X, Gilbert AM, Baggs E, Krasileva KV, Portwood JL, Cannon EKS, Andorf CM, Manchanda N, Snodgrass SJ, Hufnagel DE, Jiang Q, Pedersen S, Syring ML, Kudrna DA, Llaca V, Fengler K, Schmitz RJ, Ross-Ibarra J, Yu J, Gent JI, Hirsch CN, Ware D, Dawe RK. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 2021; 373:655-662. [PMID: 34353948 PMCID: PMC8733867 DOI: 10.1126/science.abg5289] [Citation(s) in RCA: 329] [Impact Index Per Article: 82.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 06/24/2021] [Indexed: 12/24/2022]
Abstract
We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
Collapse
Affiliation(s)
- Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Arun S Seetharam
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
- Genome Informatics Facility, Iowa State University, Ames, IA 50011, USA
| | - Margaret R Woodhouse
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | | | - Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Jianing Liu
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - William A Ricci
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Tingting Guo
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Yinjie Qiu
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Silas Tittes
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Asher I Hudson
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | | | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Zhenyuan Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | | | - Rebecca D Piri
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Na Wang
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Dong Won Kim
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Yibing Zeng
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Christine H O'Connor
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA
| | - Xianran Li
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Amanda M Gilbert
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Erin Baggs
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - John L Portwood
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Ethalinda K S Cannon
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carson M Andorf
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Nancy Manchanda
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Samantha J Snodgrass
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - David E Hufnagel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
- Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA, 50010, USA
| | - Qiuhan Jiang
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Sarah Pedersen
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Michael L Syring
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - David A Kudrna
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | | | | | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Jeffrey Ross-Ibarra
- Center for Population Biology, University of California, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
- Genome Center, University of California, Davis, CA 95616, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA 50011, USA
| | - Jonathan I Gent
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108, USA
| | - Doreen Ware
- USDA-ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - R Kelly Dawe
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
27
|
Ono Y, Asai K, Hamada M. PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores. Bioinformatics 2021; 37:589-595. [PMID: 32976553 PMCID: PMC8097687 DOI: 10.1093/bioinformatics/btaa835] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/20/2020] [Accepted: 09/11/2020] [Indexed: 12/21/2022] Open
Abstract
Motivation Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. Results To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. Availability and implementation The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8561, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Institute for Medical-oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
28
|
Kodera C, Just J, Da Rocha M, Larrieu A, Riglet L, Legrand J, Rozier F, Gaude T, Fobis-Loisy I. The molecular signatures of compatible and incompatible pollination in Arabidopsis. BMC Genomics 2021; 22:268. [PMID: 33853522 PMCID: PMC8048354 DOI: 10.1186/s12864-021-07503-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 03/02/2021] [Indexed: 12/30/2022] Open
Abstract
Background Fertilization in flowering plants depends on the early contact and acceptance of pollen grains by the receptive papilla cells of the stigma. Deciphering the specific transcriptomic response of both pollen and stigmatic cells during their interaction constitutes an important challenge to better our understanding of this cell recognition event. Results Here we describe a transcriptomic analysis based on single nucleotide polymorphisms (SNPs) present in two Arabidopsis thaliana accessions, one used as female and the other as male. This strategy allowed us to distinguish 80% of transcripts according to their parental origins. We also developed a tool which predicts male/female specific expression for genes without SNP. We report an unanticipated transcriptional activity triggered in stigma upon incompatible pollination and show that following compatible interaction, components of the pattern-triggered immunity (PTI) pathway are induced on the female side. Conclusions Our work unveils the molecular signatures of compatible and incompatible pollinations both at the male and female side. We provide invaluable resource and tools to identify potential new molecular players involved in pollen-stigma interaction. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07503-7.
Collapse
Affiliation(s)
- Chie Kodera
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France. .,Present Address: Institut Jean-Pierre Bourgin, INRAE, AgroParisTech, Université Paris-Saclay, 78000, Versailles, France.
| | - Jérémy Just
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France
| | - Martine Da Rocha
- INRAE, Université Côte d'Azur, CNRS, ISA 400 route des Chappes BP 167, F-06903, Sophia Antipolis Cedex, France
| | - Antoine Larrieu
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France.,Present Address: Centre for Plant Sciences, Faculty of Biological Sciences, University of Leeds, Leeds, UK
| | - Lucie Riglet
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France.,Present Address: Sainsbury Laboratory, Cambridge University, Cambridge, CB2 1LR, UK
| | - Jonathan Legrand
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France
| | - Frédérique Rozier
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France
| | - Thierry Gaude
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France
| | - Isabelle Fobis-Loisy
- Laboratoire Reproduction et Développement des Plantes, Univ Lyon, ENS de Lyon, UCB Lyon 1, CNRS, INRAE, Inria, F-69342, Lyon, France.
| |
Collapse
|
29
|
Bliznina A, Masunaga A, Mansfield MJ, Tan Y, Liu AW, West C, Rustagi T, Chien HC, Kumar S, Pichon J, Plessy C, Luscombe NM. Telomere-to-telomere assembly of the genome of an individual Oikopleura dioica from Okinawa using Nanopore-based sequencing. BMC Genomics 2021; 22:222. [PMID: 33781200 PMCID: PMC8008620 DOI: 10.1186/s12864-021-07512-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/05/2021] [Indexed: 11/10/2022] Open
Abstract
Background The larvacean Oikopleura dioica is an abundant tunicate plankton with the smallest (65–70 Mbp) non-parasitic, non-extremophile animal genome identified to date. Currently, there are two genomes available for the Bergen (OdB3) and Osaka (OSKA2016) O. dioica laboratory strains. Both assemblies have full genome coverage and high sequence accuracy. However, a chromosome-scale assembly has not yet been achieved. Results Here, we present a chromosome-scale genome assembly (OKI2018_I69) of the Okinawan O. dioica produced using long-read Nanopore and short-read Illumina sequencing data from a single male, combined with Hi-C chromosomal conformation capture data for scaffolding. The OKI2018_I69 assembly has a total length of 64.3 Mbp distributed among 19 scaffolds. 99% of the assembly is contained within five megabase-scale scaffolds. We found telomeres on both ends of the two largest scaffolds, which represent assemblies of two fully contiguous autosomal chromosomes. Each of the other three large scaffolds have telomeres at one end only and we propose that they correspond to sex chromosomes split into a pseudo-autosomal region and X-specific or Y-specific regions. Indeed, these five scaffolds mostly correspond to equivalent linkage groups in OdB3, suggesting overall agreement in chromosomal organization between the two populations. At a more detailed level, the OKI2018_I69 assembly possesses similar genomic features in gene content and repetitive elements reported for OdB3. The Hi-C map suggests few reciprocal interactions between chromosome arms. At the sequence level, multiple genomic features such as GC content and repetitive elements are distributed differently along the short and long arms of the same chromosome. Conclusions We show that a hybrid approach of integrating multiple sequencing technologies with chromosome conformation information results in an accurate de novo chromosome-scale assembly of O. dioica’s highly polymorphic genome. This genome assembly opens up the possibility of cross-genome comparison between O. dioica populations, as well as of studies of chromosomal evolution in this lineage. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07512-6.
Collapse
Affiliation(s)
- Aleksandra Bliznina
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Aki Masunaga
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Michael J Mansfield
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Yongkai Tan
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Andrew W Liu
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Charlotte West
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.,Francis Crick Institute, London, UK
| | - Tanmay Rustagi
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Hsiao-Chiao Chien
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Saurabh Kumar
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Julien Pichon
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Charles Plessy
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | - Nicholas M Luscombe
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.,Francis Crick Institute, London, UK.,Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK
| |
Collapse
|
30
|
Mitsuhashi S, Nakagawa S, Sasaki-Honda M, Sakurai H, Frith MC, Mitsuhashi H. Nanopore direct RNA sequencing detects DUX4-activated repeats and isoforms in human muscle cells. Hum Mol Genet 2021; 30:552-563. [PMID: 33693705 PMCID: PMC8120133 DOI: 10.1093/hmg/ddab063] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 01/27/2021] [Accepted: 02/23/2021] [Indexed: 01/11/2023] Open
Abstract
Facioscapulohumeral muscular dystrophy (FSHD) is an inherited muscle disease caused by misexpression of the DUX4 gene in skeletal muscle. DUX4 is a transcription factor, which is normally expressed in the cleavage-stage embryo and regulates gene expression involved in early embryonic development. Recent studies revealed that DUX4 also activates the transcription of repetitive elements such as endogenous retroviruses (ERVs), mammalian apparent long terminal repeat (LTR)-retrotransposons and pericentromeric satellite repeats (Human Satellite II). DUX4-bound ERV sequences also create alternative promoters for genes or long non-coding RNAs, producing fusion transcripts. To further understand transcriptional regulation by DUX4, we performed nanopore long-read direct RNA sequencing (dRNA-seq) of human muscle cells induced by DUX4, because long reads show whole isoforms with greater confidence. We successfully detected differential expression of known DUX4-induced genes and discovered 61 differentially expressed repeat loci, which are near DUX4–ChIP peaks. We also identified 247 gene–ERV fusion transcripts, of which 216 were not reported previously. In addition, long-read dRNA-seq clearly shows that RNA splicing is a common event in DUX4-activated ERV transcripts. Long-read analysis showed non-LTR transposons including Alu elements are also transcribed from LTRs. Our findings revealed further complexity of DUX4-induced ERV transcripts. This catalogue of DUX4-activated repetitive elements may provide useful information to elucidate the pathology of FSHD. Also, our results indicate that nanopore dRNA-seq has complementary strengths to conventional short-read complementary DNA sequencing.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Genomic Function and Diversity, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.,Department of Human Genetics, Yokohama City University, Yokohama, Kanagawa 236-0004, Japan
| | - So Nakagawa
- Micro/Nano Technology Center, Tokai University, Hiratsuka, Kanagawa 259-1292, Japan.,Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan
| | - Mitsuru Sasaki-Honda
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Hidetoshi Sakurai
- Center for iPS Cell Research and Application (CiRA), Kyoto University, 53 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8561, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Hiroaki Mitsuhashi
- Micro/Nano Technology Center, Tokai University, Hiratsuka, Kanagawa 259-1292, Japan.,Department of Applied Biochemistry, School of Engineering, Tokai University, Hiratsuka, Kanagawa 259-1292, Japan
| |
Collapse
|
31
|
Bryzghalov O, Makałowska I, Szcześniak MW. lncEvo: automated identification and conservation study of long noncoding RNAs. BMC Bioinformatics 2021; 22:59. [PMID: 33563213 PMCID: PMC7871587 DOI: 10.1186/s12859-021-03991-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 02/01/2021] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. RESULTS To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study-a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. CONCLUSIONS lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .
Collapse
Affiliation(s)
- Oleksii Bryzghalov
- Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Izabela Makałowska
- Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Michał Wojciech Szcześniak
- Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
32
|
Masutani B, Arimura SI, Morishita S. Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing. PLoS Comput Biol 2021; 17:e1008597. [PMID: 33434206 PMCID: PMC7833223 DOI: 10.1371/journal.pcbi.1008597] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 01/25/2021] [Accepted: 12/01/2020] [Indexed: 11/18/2022] Open
Abstract
Plant mitochondrial genomes have distinctive features compared to those of animals; namely, they are large and divergent, with sizes ranging from hundreds of thousands of to a few million bases. Recombination among repetitive regions is thought to produce similar structures that differ slightly, known as "multipartite structures," which contribute to different phenotypes. Although many reference plant mitochondrial genomes represent almost all the genes in mitochondria, the full spectrum of their structures remains largely unknown. The emergence of long-read sequencing technology is expected to yield this landscape; however, many studies aimed to assemble only one representative circular genome, because properly understanding multipartite structures using existing assemblers is not feasible. To elucidate multipartite structures, we leveraged the information in existing reference genomes and classified long reads according to their corresponding structures. We developed a method that exploits two classic algorithms, partial order alignment (POA) and the hidden Markov model (HMM) to construct a sensitive read classifier. This method enables us to represent a set of reads as a POA graph and analyze it using the HMM. We can then calculate the likelihood of a read occurring in a given cluster, resulting in an iterative clustering algorithm. For synthetic data, our proposed method reliably detected one variation site out of 9,000-bp synthetic long reads with a 15% sequencing-error rate and produced accurate clustering. It was also capable of clustering long reads from six very similar sequences containing only slight differences. For real data, we assembled putative multipartite structures of mitochondrial genomes of Arabidopsis thaliana from nine accessions sequenced using PacBio Sequel. The results indicated that there are recurrent and strain-specific structures in A. thaliana mitochondrial genomes.
Collapse
Affiliation(s)
- Bansho Masutani
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- * E-mail:
| | - Shin-ichi Arimura
- Laboratory of Plant Molecular Genetics, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| |
Collapse
|
33
|
Abstract
Long DNA and RNA reads from nanopore and PacBio technologies have many applications, but the raw reads have a substantial error rate. More accurate sequences can be obtained by merging multiple reads from overlapping parts of the same sequence. lamassemble aligns up to ∼1000 reads to each other, and makes a consensus sequence, which is often much more accurate than the raw reads. It is useful for studying a region of interest such as an expanded tandem repeat or other disease-causing mutation.
Collapse
|
34
|
Asalone KC, Ryan KM, Yamadi M, Cohen AL, Farmer WG, George DJ, Joppert C, Kim K, Mughal MF, Said R, Toksoz-Exley M, Bisk E, Bracht JR. Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Comput Biol 2020; 16:e1008104. [PMID: 32735589 PMCID: PMC7423139 DOI: 10.1371/journal.pcbi.1008104] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 08/12/2020] [Accepted: 06/29/2020] [Indexed: 12/13/2022] Open
Abstract
High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. In the genomic era, genomes must be reconstructed from fragments using computational methods, or assemblers. How do we know that a new genome assembly is correct? This is important because errors in assembly can lead to downstream problems in gene predictions and these inaccurate results can contaminate databases, affecting later comparative studies. A particular challenge occurs when a diploid organism inherits two highly divergent genome copies from its parents. While it is widely appreciated that this type of data is difficult for assemblers to handle properly, here we show that the process is prone to more errors than previously appreciated. Specifically, we document examples of regional expansion and collapse, affecting downstream gene prediction accuracy, but without changing the overall genome assembly size or other metrics of accuracy. Our results suggest that assembly evaluation methods should be altered to identify whether regional expansions and collapses are present in the genome assembly.
Collapse
Affiliation(s)
- Kathryn C. Asalone
- Biology Department, American University, Washington DC, United States of America
| | - Kara M. Ryan
- Biology Department, American University, Washington DC, United States of America
| | - Maryam Yamadi
- Biology Department, American University, Washington DC, United States of America
| | - Annastelle L. Cohen
- Biology Department, American University, Washington DC, United States of America
| | - William G. Farmer
- Biology Department, American University, Washington DC, United States of America
| | - Deborah J. George
- Biology Department, American University, Washington DC, United States of America
| | - Claudia Joppert
- Biology Department, American University, Washington DC, United States of America
| | - Kaitlyn Kim
- Biology Department, American University, Washington DC, United States of America
| | - Madeeha Froze Mughal
- Biology Department, American University, Washington DC, United States of America
| | - Rana Said
- Biology Department, American University, Washington DC, United States of America
| | - Metin Toksoz-Exley
- Mathematics and Statistics Department, American University, Washington DC, United States of America
| | - Evgeny Bisk
- Office of Information Technology, American University, Washington DC, United States of America
| | - John R. Bracht
- Biology Department, American University, Washington DC, United States of America
- * E-mail:
| |
Collapse
|
35
|
Mitsuhashi S, Ohori S, Katoh K, Frith MC, Matsumoto N. A pipeline for complete characterization of complex germline rearrangements from long DNA reads. Genome Med 2020; 12:67. [PMID: 32731881 PMCID: PMC7393826 DOI: 10.1186/s13073-020-00762-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 07/10/2020] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Many genetic/genomic disorders are caused by genomic rearrangements. Standard methods can often characterize these variations only partly, e.g., copy number changes or breakpoints. It is important to fully understand the order and orientation of rearranged fragments, with precise breakpoints, to know the pathogenicity of the rearrangements. METHODS We performed whole-genome-coverage nanopore sequencing of long DNA reads from four patients with chromosomal translocations. We identified rearrangements relative to a reference human genome, subtracted rearrangements shared by any of 33 control individuals, and determined the order and orientation of rearranged fragments, with our newly developed analysis pipeline. RESULTS We describe the full characterization of complex chromosomal rearrangements, by filtering out genomic rearrangements seen in controls without the same disease, reducing the number of loci per patient from a few thousand to a few dozen. Breakpoint detection was very accurate; we usually see ~ 0 ± 1 base difference from Sanger sequencing-confirmed breakpoints. For one patient with two reciprocal chromosomal translocations, we find that the translocation points have complex rearrangements of multiple DNA fragments involving 5 chromosomes, which we could order and orient by an automatic algorithm, thereby fully reconstructing the rearrangement. A rearrangement is more than the sum of its parts: some properties, such as sequence loss, can be inferred only after reconstructing the whole rearrangement. In this patient, the rearrangements were evidently caused by shattering of the chromosomes into multiple fragments, which rejoined in a different order and orientation with loss of some fragments. CONCLUSIONS We developed an effective analytic pipeline to find chromosomal aberration in congenital diseases by filtering benign changes, only from long read sequencing. Our algorithm for reconstruction of complex rearrangements is useful to interpret rearrangements with many breakpoints, e.g., chromothripsis. Our approach promises to fully characterize many congenital germline rearrangements, provided they do not involve poorly understood loci such as centromeric repeats.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Sachiko Ohori
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Kazutaka Katoh
- Research Institute for Microbial Diseases, Osaka University, Suita, Japan
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo, Japan.
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
| |
Collapse
|
36
|
Pollo SMJ, Reiling SJ, Wit J, Workentine ML, Guy RA, Batoff GW, Yee J, Dixon BR, Wasmuth JD. Benchmarking hybrid assemblies of Giardia and prediction of widespread intra-isolate structural variation. Parasit Vectors 2020; 13:108. [PMID: 32111234 PMCID: PMC7048089 DOI: 10.1186/s13071-020-3968-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 02/13/2020] [Indexed: 01/02/2023] Open
Abstract
Background Currently available short read genome assemblies of the tetraploid protozoan parasite Giardia intestinalis are highly fragmented, highlighting the need for improved genome assemblies at a reasonable cost. Long nanopore reads are well suited to resolve repetitive genomic regions resulting in better quality assemblies of eukaryotic genomes. Subsequent addition of highly accurate short reads to long-read assemblies further improves assembly quality. Using this hybrid approach, we assembled genomes for three Giardia isolates, two with published assemblies and one novel, to evaluate the improvement in genome quality gained from long reads. We then used the long reads to predict structural variants to examine this previously unexplored source of genetic variation in Giardia. Methods With MinION reads for each isolate, we assembled genomes using several assemblers specializing in long reads. Assembly metrics, gene finding, and whole genome alignments to the reference genomes enabled direct comparison to evaluate the performance of the nanopore reads. Further improvements from adding Illumina reads to the long-read assemblies were evaluated using gene finding. Structural variants were predicted from alignments of the long reads to the best hybrid genome for each isolate and enrichment of key genes was analyzed using random genome sampling and calculation of percentiles to find thresholds of significance. Results Our hybrid assembly method generated reference quality genomes for each isolate. Consistent with previous findings based on SNPs, examination of heterozygosity using the structural variants found that Giardia BGS was considerably more heterozygous than the other isolates that are from Assemblage A. Further, each isolate was shown to contain structural variant regions enriched for variant-specific surface proteins, a key class of virulence factor in Giardia. Conclusions The ability to generate reference quality genomes from a single MinION run and a multiplexed MiSeq run enables future large-scale comparative genomic studies within the genus Giardia. Further, prediction of structural variants from long reads allows for more in-depth analyses of major sources of genetic variation within and between Giardia isolates that could have effects on both pathogenicity and host range.![]()
Collapse
Affiliation(s)
- Stephen M J Pollo
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada.,Host-Parasite Interactions Training Program, University of Calgary, Calgary, AB, Canada
| | - Sarah J Reiling
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - Janneke Wit
- Host-Parasite Interactions Training Program, University of Calgary, Calgary, AB, Canada.,Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - Matthew L Workentine
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
| | - Rebecca A Guy
- Division of Enteric Diseases, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada
| | - G William Batoff
- Department of Biology, Biochemistry and Molecular Biology Program, Trent University, Peterborough, ON, Canada
| | - Janet Yee
- Department of Biology, Biochemistry and Molecular Biology Program, Trent University, Peterborough, ON, Canada
| | - Brent R Dixon
- Bureau of Microbial Hazards, Food Directorate, Health Canada, Ottawa, ON, Canada
| | - James D Wasmuth
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada. .,Host-Parasite Interactions Training Program, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
37
|
Long-read sequencing for rare human genetic diseases. J Hum Genet 2019; 65:11-19. [PMID: 31558760 DOI: 10.1038/s10038-019-0671-8] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/28/2019] [Accepted: 09/03/2019] [Indexed: 12/19/2022]
Abstract
During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.
Collapse
|
38
|
Rubin BER, Jones BM, Hunt BG, Kocher SD. Rate variation in the evolution of non-coding DNA associated with social evolution in bees. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180247. [PMID: 31154980 PMCID: PMC6560270 DOI: 10.1098/rstb.2018.0247] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2019] [Indexed: 11/12/2022] Open
Abstract
The evolutionary origins of eusociality represent increases in complexity from individual to caste-based, group reproduction. These behavioural transitions have been hypothesized to go hand in hand with an increased ability to regulate when and where genes are expressed. Bees have convergently evolved eusociality up to five times, providing a framework to test this hypothesis. To examine potential links between putative gene regulatory elements and social evolution, we compare alignable, non-coding sequences in 11 diverse bee species, encompassing three independent origins of reproductive division of labour and two elaborations of eusocial complexity. We find that rates of evolution in a number of non-coding sequences correlate with key social transitions in bees. Interestingly, while we find little evidence for convergent rate changes associated with independent origins of social behaviour, a number of molecular pathways exhibit convergent rate changes in conjunction with subsequent elaborations of social organization. We also present evidence that many novel non-coding regions may have been recruited alongside the origin of sociality in corbiculate bees; these loci could represent gene regulatory elements associated with division of labour within this group. Thus, our findings are consistent with the hypothesis that gene regulatory innovations are associated with the evolution of eusociality and illustrate how a thorough examination of both coding and non-coding sequence can provide a more complete understanding of the molecular mechanisms underlying behavioural evolution. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Benjamin E. R. Rubin
- Department of Ecology and Evolutionary Biology; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Beryl M. Jones
- Program in Ecology, Evolution, and Conservation Biology, University of Illinois, Urbana, IL, USA
| | - Brendan G. Hunt
- Department of Entomology, University of Georgia, Griffin, GA, USA
| | - Sarah D. Kocher
- Department of Ecology and Evolutionary Biology; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| |
Collapse
|
39
|
Frith MC, Khan S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res 2019; 46:1661-1673. [PMID: 29272440 PMCID: PMC5829575 DOI: 10.1093/nar/gkx1266] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/07/2017] [Indexed: 01/29/2023] Open
Abstract
Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex 'local' mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| | - Sofia Khan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| |
Collapse
|
40
|
De Coster W, De Rijk P, De Roeck A, De Pooter T, D'Hert S, Strazisar M, Sleegers K, Van Broeckhoven C. Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome. Genome Res 2019; 29:1178-1187. [PMID: 31186302 PMCID: PMC6633254 DOI: 10.1101/gr.244939.118] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 06/06/2019] [Indexed: 01/17/2023]
Abstract
We sequenced the genome of the Yoruban reference individual NA19240 on the long-read sequencing platform Oxford Nanopore PromethION for evaluation and benchmarking of recently published aligners and germline structural variant calling tools, as well as a comparison with the performance of structural variant calling from short-read sequencing data. The structural variant caller Sniffles after NGMLR or minimap2 alignment provides the most accurate results, but additional confidence or sensitivity can be obtained by a combination of multiple variant callers. Sensitive and fast results can be obtained by minimap2 for alignment and a combination of Sniffles and SVIM for variant identification. We describe a scalable workflow for identification, annotation, and characterization of tens of thousands of structural variants from long-read genome sequencing of an individual or population. By discussing the results of this well-characterized reference individual, we provide an approximation of what can be expected in future long-read sequencing studies aiming for structural variant identification.
Collapse
Affiliation(s)
- Wouter De Coster
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Peter De Rijk
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Arne De Roeck
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Tim De Pooter
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Svenn D'Hert
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Mojca Strazisar
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
- Neuromics Support Facility, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
| | - Kristel Sleegers
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| | - Christine Van Broeckhoven
- Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, 2610 Antwerp, Belgium
- Biomedical Sciences, University of Antwerp, 2610 Antwerp, Belgium
| |
Collapse
|
41
|
Mitsuhashi S, Frith MC, Mizuguchi T, Miyatake S, Toyota T, Adachi H, Oma Y, Kino Y, Mitsuhashi H, Matsumoto N. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol 2019; 20:58. [PMID: 30890163 PMCID: PMC6425644 DOI: 10.1186/s13059-019-1667-6] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 03/01/2019] [Indexed: 01/03/2023] Open
Abstract
Tandemly repeated DNA is highly mutable and causes at least 31 diseases, but it is hard to detect pathogenic repeat expansions genome-wide. Here, we report robust detection of human repeat expansions from careful alignments of long but error-prone (PacBio and nanopore) reads to a reference genome. Our method is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we prioritize pathogenic expansions within the top 10 out of 700,000 tandem repeats in whole genome sequencing data. This may help to elucidate the many genetic diseases whose causes remain unknown.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan.
| | - Takeshi Mizuguchi
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Satoko Miyatake
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan
| | - Tomoko Toyota
- Department of Neurology, University of Occupational and Environmental Health School of Medicine, Kitakyushu, Fukuoka, Japan
| | - Hiroaki Adachi
- Department of Neurology, University of Occupational and Environmental Health School of Medicine, Kitakyushu, Fukuoka, Japan
| | - Yoko Oma
- Department of Liberal Arts, Faculty of Medicine, Saitama Medical University, Iruma, Saitama, Japan
| | - Yoshihiro Kino
- Department of Bioinformatics and Molecular Neuropathology, Meiji Pharmaceutical University, Kiyose, Tokyo, Japan
| | - Hiroaki Mitsuhashi
- Department of Applied Biochemistry, School of Engineering, Tokai University, Hiratsuka, Kanagawa, Japan
| | - Naomichi Matsumoto
- Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan
| |
Collapse
|
42
|
Seki M, Katsumata E, Suzuki A, Sereewattanawoot S, Sakamoto Y, Mizushima-Sugano J, Sugano S, Kohno T, Frith MC, Tsuchihara K, Suzuki Y. Evaluation and application of RNA-Seq by MinION. DNA Res 2019; 26:55-65. [PMID: 30462165 PMCID: PMC6379022 DOI: 10.1093/dnares/dsy038] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 10/15/2018] [Indexed: 12/27/2022] Open
Abstract
The current RNA-Seq method analyses fragments of mRNAs, from which it is occasionally difficult to reconstruct the entire transcript structure. Here, we performed and evaluated the recent procedure for full-length cDNA sequencing using the Nanopore sequencer MinION. We applied MinION RNA-Seq for various applications, which would not always be easy using the usual RNA-Seq by Illumina. First, we examined and found that even though the sequencing accuracy was still limited to 92.3%, practically useful RNA-Seq analysis is possible. Particularly, taking advantage of the long-read nature of MinION, we demonstrate the identification of splicing patterns and their combinations as a form of full-length cDNAs without losing precise information concerning their expression levels. Transcripts of fusion genes in cancer cells can also be identified and characterized. Furthermore, the full-length cDNA information can be used for phasing of the SNPs detected by WES on the transcripts, providing essential information to identify allele-specific transcriptional events. We constructed a catalogue of full-length cDNAs in seven major organs for two particular individuals and identified allele-specific transcription and splicing. Finally, we demonstrate that single-cell sequencing is also possible. RNA-Seq on the MinION platform should provide a novel approach that is complementary to the current RNA-Seq.
Collapse
Affiliation(s)
- Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Eri Katsumata
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Ayako Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Sarun Sereewattanawoot
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Junko Mizushima-Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
- Department of Chemistry and Life Science, School of Advanced Engineering, Kogakuin University, Shinjuku-ku, Tokyo, Japan
| | - Sumio Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
- Department of Molecular Epidemiology, Medical Research Institute, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
| | - Takashi Kohno
- Division of Genome Biology, National Cancer Center Research Institute, Chuo-Ku, Tokyo, Japan
| | - Martin C Frith
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
- Artificial Intelligence Research Center AIST, Koto-ku, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan
| | - Katsuya Tsuchihara
- Division of Translational Informatics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| |
Collapse
|
43
|
Shabardina V, Kischka T, Manske F, Grundmann N, Frith MC, Suzuki Y, Makałowski W. NanoPipe-a web server for nanopore MinION sequencing data analysis. Gigascience 2019; 8:giy169. [PMID: 30689855 PMCID: PMC6377397 DOI: 10.1093/gigascience/giy169] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 12/10/2018] [Accepted: 12/23/2018] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The fast-moving progress of the third-generation long-read sequencing technologies will soon bring the biological and medical sciences to a new era of research. Altogether, the technique and experimental procedures are becoming more straightforward and available to biologists from diverse fields, even without any profound experience in DNA sequencing. Thus, the introduction of the MinION device by Oxford Nanopore Technologies promises to "bring sequencing technology to the masses" and also allows quick and operative analysis in field studies. However, the convenience of this sequencing technology dramatically contrasts with the available analysis tools, which may significantly reduce enthusiasm of a "regular" user. To really bring the sequencing technology to every biologist, we need a set of user-friendly tools that can perform a powerful analysis in an automatic manner. FINDINGS NanoPipe was developed in consideration of the specifics of the MinION sequencing technologies, providing accordingly adjusted alignment parameters. The range of the target species/sequences for the alignment is not limited, and the descriptive usage page of NanoPipe helps a user to succeed with NanoPipe analysis. The results contain alignment statistics, consensus sequence, polymorphisms data, and visualization of the alignment. Several test cases are used to demonstrate the efficiency of the tool. CONCLUSIONS Freely available NanoPipe software allows effortless and reliable analysis of MinION sequencing data for experienced bioinformaticians, as well for wet-lab biologists with minimum bioinformatics knowledge. Moreover, for the latter group, we describe the basic algorithm necessary for MinION sequencing analysis from the first to last step.
Collapse
Affiliation(s)
- Victoria Shabardina
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Tabea Kischka
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Felix Manske
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Norbert Grundmann
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| | - Martin C Frith
- Artificial Intelligence Research Center, AIST, 2-3-26, Aomi, Koto-ku, Tokyo, 135-0064, Japan
- Department of Computational Biology and Medical Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
- AIST-Waseda University Computational Bio Big Data Open Innovation Laboratory, 3-4-1 Ookubo, Shinjuku-ku, Tokyo, 169-8555, Japan
| | - Yutaka Suzuki
- Laboratory of Systems Genomics, Department of Computational Biology and Medical Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Wojciech Makałowski
- Institue of Bioinformatics, University of Muenster, Niels-Stensen-Strasse 14, Muenster, 48149, Germany
| |
Collapse
|
44
|
Suzuki A, Suzuki M, Mizushima-Sugano J, Frith MC, Makalowski W, Kohno T, Sugano S, Tsuchihara K, Suzuki Y. Sequencing and phasing cancer mutations in lung cancers using a long-read portable sequencer. DNA Res 2018; 24:585-596. [PMID: 29117310 PMCID: PMC5726485 DOI: 10.1093/dnares/dsx027] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 05/29/2017] [Indexed: 01/18/2023] Open
Abstract
Here, we employed cDNA amplicon sequencing using a long-read portable sequencer, MinION, to characterize various types of mutations in cancer-related genes, namely, EGFR, KRAS, NRAS and NF1. For homozygous SNVs, the precision and recall rates were 87.5% and 91.3%, respectively. For previously reported hotspot mutations, the precision and recall rates reached 100%. The precise junctions of EML4-ALK, CCDC6-RET and five other gene fusions were also detected. Taking advantages of long-read sequencing, we conducted phasing of EGFR mutations and elucidated the mutational allelic backgrounds of anti-tumor drug-sensitive and resistant mutations, which could provide useful information for selecting therapeutic approaches. In the H1975 cells, 72% of the reads harbored both L858R and T790M mutations, and 22% of the reads harbored neither mutation. To ensure that the clinical requirements can be met in potentially low cancer cell populations, we further conducted a serial dilution analysis of the template for EGFR mutations. Several percent of the mutant alleles could be detected depending on the yield and quality of the sequencing data. Finally, we characterized the mutation genotypes in eight clinical samples. This method could be a convenient long-read sequencing-based analytical approach and thus may change the current approaches used for cancer genome sequencing.
Collapse
Affiliation(s)
- Ayako Suzuki
- Division of Translational Genomics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Mizuto Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Junko Mizushima-Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.,Department of Chemistry and Life Science, Kogakuin University, Nishi-Shinjuku, Shinjuku-Ku, Tokyo, Japan
| | - Martin C Frith
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.,Computational Biology Research Center, The National Institute for Advanced Industrial Science and Technology, Aomi, Koto-Ku, Tokyo, Japan
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Munster, Germany
| | - Takashi Kohno
- Division of Genome Biology, National Cancer Center Research Institute, Tsukiji, Chuo-Ku, Tokyo, Japan
| | - Sumio Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Katsuya Tsuchihara
- Division of Translational Genomics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| |
Collapse
|
45
|
Takeda T, Hamada M, Hancock J. Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm. Bioinformatics 2018; 34:576-584. [PMID: 29040374 PMCID: PMC5860613 DOI: 10.1093/bioinformatics/btx643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 10/10/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation The software is available at https://github.com/bigsea-t/fab-phmm. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Taikai Takeda
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.,Institute for Medical-Oriented Structural Biology, Waseda University, Tokyo 162-8480, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| | | |
Collapse
|
46
|
Mitsuhashi S, Nakagawa S, Takahashi Ueda M, Imanishi T, Frith MC, Mitsuhashi H. Nanopore-based single molecule sequencing of the D4Z4 array responsible for facioscapulohumeral muscular dystrophy. Sci Rep 2017; 7:14789. [PMID: 29093467 PMCID: PMC5665936 DOI: 10.1038/s41598-017-13712-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 09/25/2017] [Indexed: 11/16/2022] Open
Abstract
Subtelomeric macrosatellite repeats are difficult to sequence using conventional sequencing methods owing to the high similarity among repeat units and high GC content. Sequencing these repetitive regions is challenging, even with recent improvements in sequencing technologies. Among these repeats, a haplotype carrying a particular sequence and shortening of the D4Z4 array on human chromosome 4q35 causes one of the most prevalent forms of muscular dystrophy with autosomal-dominant inheritance, facioscapulohumeral muscular dystrophy (FSHD). Here, we applied a nanopore-based ultra-long read sequencer to sequence a BAC clone containing 13 D4Z4 repeats and flanking regions. We successfully obtained the whole D4Z4 repeat sequence, including the pathogenic gene DUX4 in the last D4Z4 repeat. The estimated sequence accuracy of the total repeat region was 99.8% based on a comparison with the reference sequence. Errors were typically observed between purine or between pyrimidine bases. Further, we analyzed the D4Z4 sequence from publicly available ultra-long whole human genome sequencing data obtained by nanopore sequencing. This technology may be a new tool for studying D4Z4 repeats and pathomechanism of FSHD in the future and has the potential to widen our understanding of subtelomeric regions.
Collapse
Affiliation(s)
- Satomi Mitsuhashi
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa, 259-1193, Japan. .,Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa, 236-0004, Japan.
| | - So Nakagawa
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa, 259-1193, Japan.,Micro/Nano Technology Center, Tokai University, Hiratsuka, Kanagawa, 259-1291, Japan
| | - Mahoko Takahashi Ueda
- Micro/Nano Technology Center, Tokai University, Hiratsuka, Kanagawa, 259-1291, Japan
| | - Tadashi Imanishi
- Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa, 259-1193, Japan
| | - Martin C Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba, 277-8562, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 169-8555, Japan
| | - Hiroaki Mitsuhashi
- Department of Applied Biochemistry, School of Engineering, Tokai University, Hiratsuka, Kanagawa, 259-1292, Japan
| |
Collapse
|