1
|
Sun H, Wang Y, Xiao Z, Huang X, Wang H, He T, Jiang X. multiMiAT: an optimal microbiome-based association test for multicategory phenotypes. Brief Bioinform 2023; 24:7005163. [PMID: 36702753 DOI: 10.1093/bib/bbad012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/31/2022] [Accepted: 01/03/2023] [Indexed: 01/28/2023] Open
Abstract
Microbes can affect the metabolism and immunity of human body incessantly, and the dysbiosis of human microbiome drives not only the occurrence but also the progression of disease (i.e. multiple statuses of disease). Recently, microbiome-based association tests have been widely developed to detect the association between the microbiome and host phenotype. However, the existing methods have not achieved satisfactory performance in testing the association between the microbiome and ordinal/nominal multicategory phenotypes (e.g. disease severity and tumor subtype). In this paper, we propose an optimal microbiome-based association test for multicategory phenotypes, namely, multiMiAT. Specifically, under the multinomial logit model framework, we first introduce a microbiome regression-based kernel association test for multicategory phenotypes (multiMiRKAT). As a data-driven optimal test, multiMiAT then integrates multiMiRKAT, score test and MiRKAT-MC to maintain excellent performance in diverse association patterns. Massive simulation experiments prove the success of our method. Furthermore, multiMiAT is also applied to real microbiome data experiments to detect the association between the gut microbiome and clinical statuses of colorectal cancer as well as for diverse statuses of Clostridium difficile infections.
Collapse
Affiliation(s)
- Han Sun
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Yue Wang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Zhen Xiao
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
| | - Xiaoyun Huang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- Collaborative & Innovative Center for Educational Technology, Central China Normal University, Wuhan 430079, China
| | - Haodong Wang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Tingting He
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
2
|
Sekino M, Hashimoto K, Nakamichi R, Yamamoto M, Fujinami Y, Sasaki T. Introgressive hybridization in the west Pacific pen shells (genus Atrina): Restricted interspecies gene flow within the genome. Mol Ecol 2023; 32:2945-2963. [PMID: 36855846 DOI: 10.1111/mec.16908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 02/03/2023] [Accepted: 02/14/2023] [Indexed: 03/02/2023]
Abstract
A compelling interest in marine biology is to elucidate how species boundaries between sympatric free-spawning marine invertebrates such as bivalve molluscs are maintained in the face of potential hybridization. Hybrid zones provide the natural resources for us to study the underlying genetic mechanisms of reproductive isolation between hybridizing species. Against this backdrop, we examined the occurrence of introgressive hybridization (introgression) between two bivalves distributed in the western Pacific margin, Atrina japonica and Atrina lischkeana, based on single-nucleotide polymorphisms (SNPs) derived from restriction site-associated DNA sequencing. Using 1066 ancestry-informative SNP sites, we also investigated the extent of introgression within the genome to search for SNP sites with reduced interspecies gene flow. A series of our individual-level clustering analyses including the principal component analysis, Bayesian model-based clustering, and triangle plotting based on ancestry-heterozygosity relationships for an admixed population sample from the Seto Inland Sea (Japan) consistently suggested the presence of specimens with varying degrees of genomic admixture, thereby implying that the two species are not completely isolated. The Bayesian genomic cline analysis identified 10 SNP sites with reduced introgression, each of which was located within a genic region or an intergenic region physically close to a functional gene. No, or very few, heterozygotes were observed at these sites in the hybrid zone, suggesting that selection acts against heterozygotes. Accordingly, we raised the possibility that the SNP sites are within genomic regions that are incompatible between the two species. Our finding of restricted interspecies gene flow at certain genomic regions gives new insight into the maintenance of species boundaries in hybridizing broadcast-spawning molluscs.
Collapse
Affiliation(s)
- Masashi Sekino
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa, Japan
| | - Kazumasa Hashimoto
- Fisheries Technology Institute, Japan Fisheries Research and Education Agency, Nagasaki, Japan
| | - Reiichiro Nakamichi
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa, Japan
| | - Masayuki Yamamoto
- Fisheries Division, Kagawa Prefectural Government, Takamatsu, Kagawa, Japan
| | - Yuichiro Fujinami
- Goto Field Station, Fisheries Technology Institute, Japan Fisheries Research and Education Agency, Nagasaki, Japan
| | - Takenori Sasaki
- The University Museum, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Sun Y, Zhao L, Cai H, Liu W, Sun T. Composition and factors influencing community structure of lactic acid bacterial in dairy products from Nyingchi Prefecture of Tibet. J Biosci Bioeng 2023; 135:44-53. [PMID: 36384718 DOI: 10.1016/j.jbiosc.2022.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 10/17/2022] [Accepted: 10/20/2022] [Indexed: 11/15/2022]
Abstract
This study investigated the community composition of lactic acid bacteria (LAB) from yaks' milk (YM) Tibetan yellow cattle milk (TM) and their fermented products from different counties in the Nyingchi Prefecture, Tibet using Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing. Sequencing revealed 26 genera and 94 species from 71 dairy samples; amongst these Lactobacillus delbrueckii (36.17%), Streptococcus thermophilus (19.46%) and Lactococcus lactis (18.33%) were the predominant species. This study also identified the main factors influencing LAB community composition by comparing amongst samples from different locations, from different milk types, and from different altitudes. The LAB communities in YM and TM were more diverse than in fermented yaks' milk (FYM) and fermented Tibetan yellow cattle milk (FTM) samples. Similarly, whether milk was fermented or not accounted for differences in LAB species composition while altitude of the dairy products had very little effect. Milk source and production process were the most likely causes of drastic shifts in microbial community composition. In addition, fermented dairy products were enriched in genes responsible for secondary metabolic pathways that were potentially beneficial for health. Comprehensive descriptions of the microbiota in different dairy products from the Nyingchi Prefecture, Tibet might help elucidate evolutionary and functional relationships amongst bacterial communities in these products.
Collapse
Affiliation(s)
- Yue Sun
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot 010018, PR China
| | - Lixia Zhao
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot 010018, PR China
| | - Hongyu Cai
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot 010018, PR China
| | - Wenjun Liu
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot 010018, PR China
| | - Tiansong Sun
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot 010018, PR China; Collaborative Innovative Center of Ministry of Education for Lactic Acid Bacteria and Fermented Dairy Products, Inner Mongolia Agricultural University, Hohhot 010018, PR China.
| |
Collapse
|
4
|
Enespa, Chandra P. Tool and techniques study to plant microbiome current understanding and future needs: an overview. Commun Integr Biol 2022; 15:209-225. [PMID: 35967908 PMCID: PMC9367660 DOI: 10.1080/19420889.2022.2082736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Microorganisms are present in the universe and they play role in beneficial and harmful to human life, society, and environments. Plant microbiome is a broad term in which microbes are present in the rhizo, phyllo, or endophytic region and play several beneficial and harmful roles with the plant. To know of these microorganisms, it is essential to be able to isolate purification and identify them quickly under laboratory conditions. So, to improve the microbial study, several tools and techniques such as microscopy, rRNA, or rDNA sequencing, fingerprinting, probing, clone libraries, chips, and metagenomics have been developed. The major benefits of these techniques are the identification of microbial community through direct analysis as well as it can apply in situ. Without tools and techniques, we cannot understand the roles of microbiomes. This review explains the tools and their roles in the understanding of microbiomes and their ecological diversity in environments.
Collapse
Affiliation(s)
- Enespa
- Department of Plant Pathology, School of Agriculture, SMPDC, University of Lucknow, Lucknow, India
| | - Prem Chandra
- Department of Environmental Microbiology, Babasaheb Bhimrao Ambedkar (A Central) University, Lucknow, India
| |
Collapse
|
5
|
Ramírez Rojas AA, Swidah R, Schindler D. Microbes of traditional fermentation processes as synthetic biology chassis to tackle future food challenges. Front Bioeng Biotechnol 2022; 10:982975. [PMID: 36185425 PMCID: PMC9523148 DOI: 10.3389/fbioe.2022.982975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 08/10/2022] [Indexed: 11/23/2022] Open
Abstract
Microbial diversity is magnificent and essential to almost all life on Earth. Microbes are an essential part of every human, allowing us to utilize otherwise inaccessible resources. It is no surprise that humans started, initially unconsciously, domesticating microbes for food production: one may call this microbial domestication 1.0. Sourdough bread is just one of the miracles performed by microbial fermentation, allowing extraction of more nutrients from flour and at the same time creating a fluffy and delicious loaf. There are a broad range of products the production of which requires fermentation such as chocolate, cheese, coffee and vinegar. Eventually, with the rise of microscopy, humans became aware of microbial life. Today our knowledge and technological advances allow us to genetically engineer microbes - one may call this microbial domestication 2.0. Synthetic biology and microbial chassis adaptation allow us to tackle current and future food challenges. One of the most apparent challenges is the limited space on Earth available for agriculture and its major tolls on the environment through use of pesticides and the replacement of ecosystems with monocultures. Further challenges include transport and packaging, exacerbated by the 24/7 on-demand mentality of many customers. Synthetic biology already tackles multiple food challenges and will be able to tackle many future food challenges. In this perspective article, we highlight recent microbial synthetic biology research to address future food challenges. We further give a perspective on how synthetic biology tools may teach old microbes new tricks, and what standardized microbial domestication could look like.
Collapse
|
6
|
Zeb U, Wang X, AzizUllah A, Fiaz S, Khan H, Ullah S, Ali H, Shahzad K. Comparative genome sequence and phylogenetic analysis of chloroplast for evolutionary relationship among Pinus species. Saudi J Biol Sci 2022; 29:1618-1627. [PMID: 35280541 PMCID: PMC8913380 DOI: 10.1016/j.sjbs.2021.10.070] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 08/24/2021] [Accepted: 10/31/2021] [Indexed: 01/02/2023] Open
Abstract
Genus Pinus is a widely dispersed genus of conifer plants in the Northern Hemisphere. However, the inadequate accessibility of genomic knowledge limits our understanding of molecular phylogeny and evolution of Pinus species. In this study, the evolutionary features of complete plastid genome and the phylogeny of the Pinus genus were studied. A total of thirteen divergent hotspot regions (trnk-UUU, matK, trnQ-UUG, atpF, atpH, rpoC1, rpoC2, rpoB, ycf2, ycf1, trnD-GUC, trnY-GUA, and trnH-GUG) were identified that would be utilized as possible genetic markers for determination of phylogeny and population genetics analysis of Pinus species. Furthermore, seven genes (petD, psaI, psaM, matK, rps18, ycf1, and ycf2) with positive selection site in Pinus species were identified. Based on the whole genome this phylogenetic study showed that twenty-four Pinus species form a significant genealogical clade. Divergence time showed that the Pinus species originated about 100 million years ago (MYA) (95% HPD, 101.76.35–109.79 MYA), in lateral stages of Cretaceous. Moreover, two of the subgenera are consequently originated in 85.05 MYA (95% HPD, 81.04–88.02 MYA). This study provides a phylogenetic relationship and a chronological framework for the future study of the molecular evolution of the Pinus species.
Collapse
Affiliation(s)
- Umar Zeb
- Department of Biology, The University of Haripur, 22620, Pakistan
| | - Xiukang Wang
- College of Life Sciences, Yan’an University, Yan’an 716000, Shaanxi, China
- Corresponding authors.
| | | | - Sajid Fiaz
- Department of Plant Breeding anf Genetics, The University of Haripur, 22620 Haripur, Pakistan
- Corresponding authors.
| | - Hanif Khan
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest University, Xi’an 710069, China
| | - Shariat Ullah
- Department of Botany University of Malakand, Pakistan
| | - Habib Ali
- Department of Agricultural Engineering, Khawaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Punjab, Pakistan
| | - Khurram Shahzad
- Department of Plant Breeding anf Genetics, The University of Haripur, 22620 Haripur, Pakistan
| |
Collapse
|
7
|
Zimin AV, Salzberg SL. The SAMBA tool uses long reads to improve the contiguity of genome assemblies. PLoS Comput Biol 2022; 18:e1009860. [PMID: 35120119 PMCID: PMC8849508 DOI: 10.1371/journal.pcbi.1009860] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 02/16/2022] [Accepted: 01/24/2022] [Indexed: 01/03/2023] Open
Abstract
Third-generation sequencing technologies can generate very long reads with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using shorter reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. One strategy to upgrade existing assemblies is to generate additional coverage using long-read data, and add that to the previously assembled contigs. SAMBA is a tool that is designed to scaffold and gap-fill existing genome assemblies with additional long-read data, resulting in substantially greater contiguity. SAMBA is the only tool of its kind that also computes and fills in the sequence for all spanned gaps in the scaffolds, yielding much longer contigs. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca. The DNA molecule that is in almost every cell in a living organism can be represented as sequence of four different nucleotides, or bases denoted by letters A,C,G, and T. The current sequencing technologies require breaking the DNA molecule into short fragments, sequencing them to find the corresponding sequence of letters, producing “reads”, and assembly, which recovered the DNA sequence from the reads. Repeats in the genome sequences typically prevented one from recovering full contiguous genome sequence because any repeat that is longer than the size of the read cannot be reliably resolved. Third-generation sequencing technologies can generate very long reads albeit with relatively high error rates. The lengths of the reads, which sometimes exceed one million bases, make them invaluable for resolving complex repeats that cannot be assembled using previous-generation reads. Many high-quality genome assemblies have already been produced, curated, and annotated using the previous generation of sequencing data, and full re-assembly of these genomes with long reads is not always practical or cost-effective. Here we introduce a tool called SAMBA that is designed to upgrade existing assemblies using additional coverage with long-read data, resulting in substantially greater contiguity. Here we compare SAMBA to several similar tools capable of re-scaffolding assemblies using long-read data, and we show that SAMBA yields better contiguity and introduces fewer errors than competing methods. SAMBA is open-source software that is distributed at https://github.com/alekseyzimin/masurca.
Collapse
Affiliation(s)
- Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| | - Steven L. Salzberg
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
8
|
|
9
|
Zhong Y, Xu F, Wu J, Schubert J, Li MM. Application of Next Generation Sequencing in Laboratory Medicine. Ann Lab Med 2021; 41:25-43. [PMID: 32829577 PMCID: PMC7443516 DOI: 10.3343/alm.2021.41.1.25] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 03/24/2020] [Accepted: 08/07/2020] [Indexed: 12/12/2022] Open
Abstract
The rapid development of next-generation sequencing (NGS) technology, including advances in sequencing chemistry, sequencing technologies, bioinformatics, and data interpretation, has facilitated its wide clinical application in precision medicine. This review describes current sequencing technologies, including short- and long-read sequencing technologies, and highlights the clinical application of NGS in inherited diseases, oncology, and infectious diseases. We review NGS approaches and clinical diagnosis for constitutional disorders; summarize the application of U.S. Food and Drug Administration-approved NGS panels, cancer biomarkers, minimal residual disease, and liquid biopsy in clinical oncology; and consider epidemiological surveillance, identification of pathogens, and the importance of host microbiome in infectious diseases. Finally, we discuss the challenges and future perspectives of clinical NGS tests.
Collapse
Affiliation(s)
- Yiming Zhong
- Department of Pathology & Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA,
USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA,
USA
| | - Feng Xu
- Department of Pathology & Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA,
USA
| | - Jinhua Wu
- Department of Pathology & Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA,
USA
| | - Jeffrey Schubert
- Department of Pathology & Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA,
USA
| | - Marilyn M. Li
- Department of Pathology & Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA,
USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA,
USA
- Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, PA,
USA
| |
Collapse
|
10
|
Cheng QQ, Ouyang Y, Tang ZY, Lao CC, Zhang YY, Cheng CS, Zhou H. Review on the Development and Applications of Medicinal Plant Genomes. FRONTIERS IN PLANT SCIENCE 2021; 12:791219. [PMID: 35003182 PMCID: PMC8732986 DOI: 10.3389/fpls.2021.791219] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 11/23/2021] [Indexed: 05/04/2023]
Abstract
With the development of sequencing technology, the research on medicinal plants is no longer limited to the aspects of chemistry, pharmacology, and pharmacodynamics, but reveals them from the genetic level. As the price of next-generation sequencing technology becomes affordable, and the long-read sequencing technology is established, the medicinal plant genomes with large sizes have been sequenced and assembled more easily. Although the review of plant genomes has been reported several times, there is no review giving a systematic and comprehensive introduction about the development and application of medicinal plant genomes that have been reported until now. Here, we provide a historical perspective on the current situation of genomes in medicinal plant biology, highlight the use of the rapidly developing sequencing technologies, and conduct a comprehensive summary on how the genomes apply to solve the practical problems in medicinal plants, like genomics-assisted herb breeding, evolution history revelation, herbal synthetic biology study, and geoherbal research, which are important for effective utilization, rational use and sustainable protection of medicinal plants.
Collapse
Affiliation(s)
- Qi-Qing Cheng
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
| | - Yue Ouyang
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
| | - Zi-Yu Tang
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
| | - Chi-Chou Lao
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
| | - Yan-Yu Zhang
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
| | - Chun-Song Cheng
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
- Lushan Botanical Garden, Chinese Academy of Sciences, Jiujiang, China
| | - Hua Zhou
- State Key Laboratory of Quality Research in Chinese Medicine, Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, Macao SAR, China
- Joint Laboratory for Translational Cancer Research of Chinese Medicine, The Ministry of Education of the People’s Republic of China, Macau University of Science and Technology, Taipa, Macao SAR, China
- *Correspondence: Hua Zhou,
| |
Collapse
|
11
|
Abstract
Lyme disease (Lyme borreliosis) is a tick-borne, zoonosis of adults and children caused by genospecies of the Borrelia burgdorferi sensu lato complex. The ailment, widespread throughout the Northern Hemisphere, continues to increase globally due to multiple environmental factors, coupled with increased incursion of humans into habitats that harbor the spirochete. B. burgdorferi sensu lato is transmitted by ticks from the Ixodes ricinus complex. In North America, B. burgdorferi causes nearly all infections; in Europe, B. afzelii and B. garinii are most associated with human disease. The spirochete's unusual fragmented genome encodes a plethora of differentially expressed outer surface lipoproteins that play a seminal role in the bacterium's ability to sustain itself within its enzootic cycle and cause disease when transmitted to its incidental human host. Tissue damage and symptomatology (i.e., clinical manifestations) result from the inflammatory response elicited by the bacterium and its constituents. The deposition of spirochetes into human dermal tissue generates a local inflammatory response that manifests as erythema migrans (EM), the hallmark skin lesion. If treated appropriately and early, the prognosis is excellent. However, in untreated patients, the disease may present with a wide range of clinical manifestations, most commonly involving the central nervous system, joints, or heart. A small percentage (~10%) of patients may go on to develop a poorly defined fibromyalgia-like illness, post-treatment Lyme disease (PTLD) unresponsive to prolonged antimicrobial therapy. Below we integrate current knowledge regarding the ecologic, epidemiologic, microbiologic, and immunologic facets of Lyme disease into a conceptual framework that sheds light on the disorder that healthcare providers encounter.
Collapse
Affiliation(s)
- Justin D. Radolf
- Department of Medicine, UConn Health, Farmington, CT 06030, USA
- Department of Pediatrics, UConn Health, Farmington, CT 06030, USA
- Departments of Genetics and Genome Sciences, UConn Health, Farmington, CT 06030, USA
- Departments of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030, USA
- Department of Immunology, UConn Health, Farmington, CT 06030, USA
| | - Klemen Strle
- Division of Infectious Diseases, Wadsworth Center, NY Department of Health, Albany NY, 12208, USA
| | - Jacob E. Lemieux
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - Franc Strle
- Department of Infectious Diseases, University Medical Center Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
12
|
Wu L, Williams JS, Sun L, Kao TH. Sequence analysis of the Petunia inflata S-locus region containing 17 S-Locus F-Box genes and the S-RNase gene involved in self-incompatibility. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:1348-1368. [PMID: 33048387 DOI: 10.1111/tpj.15005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 06/11/2023]
Abstract
Self-incompatibility in Petunia is controlled by the polymorphic S-locus, which contains S-RNase encoding the pistil determinant and 16-20 S-locus F-box (SLF) genes collectively encoding the pollen determinant. Here we sequenced and assembled approximately 3.1 Mb of the S2 -haplotype of the S-locus in Petunia inflata using bacterial artificial chromosome clones collectively containing all 17 SLF genes, SLFLike1, and S-RNase. Two SLF pseudogenes and 28 potential protein-coding genes were identified, 20 of which were also found at the S-loci of both the S6a -haplotype of P. inflata and the SN -haplotype of self-compatible Petunia axillaris, but not in the S-locus remnants of self-compatible potato (Solanum tuberosum) and tomato (Solanum lycopersicum). Comparative analyses of S-locus sequences of these three S-haplotypes revealed potential genetic exchange in the flanking regions of SLF genes, resulting in highly similar flanking regions between different types of SLF and between alleles of the same type of SLF of different S-haplotypes. The high degree of sequence similarity in the flanking regions could often be explained by the presence of similar long terminal repeat retroelements, which were enriched at the S-loci of all three S-haplotypes and in the flanking regions of all S-locus genes examined. We also found evidence of the association of transposable elements with SLF pseudogenes. Based on the hypothesis that SLF genes were derived by retrotransposition, we identified 10 F-box genes as putative SLF parent genes. Our results shed light on the importance of non-coding sequences in the evolution of the S-locus, and on possible evolutionary mechanisms of generation, proliferation, and deletion of SLF genes.
Collapse
Affiliation(s)
- Lihua Wu
- Intercollege Graduate Degree Program in Plant Biology, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Justin S Williams
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Linhan Sun
- Intercollege Graduate Degree Program in Plant Biology, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| | - Teh-Hui Kao
- Intercollege Graduate Degree Program in Plant Biology, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA
| |
Collapse
|
13
|
Abstract
Pathogenic Vibrio cholerae strains express multiple virulence factors that are encoded by bacteriophage and chromosomal islands. These include cholera toxin and the intestinal colonization pilus called the toxin-coregulated pilus, which are essential for causing severe disease in humans. However, it is presently unclear how the expression of these horizontally acquired accessory virulence genes can be efficiently integrated with preexisting transcriptional programs that are presumably fine-tuned for optimal expression in V. cholerae before its conversion to a human pathogen. Here, we report the role of a transcriptional regulator (TsrA) in silencing horizontally acquired genes encoding important virulence factors. We propose that this factor could be critical to the efficient acquisition of accessory virulence genes by silencing their expression until other signals trigger their transcriptional activation within the host. Vibrio cholerae is a globally important pathogen responsible for the severe epidemic diarrheal disease called cholera. The current and ongoing seventh pandemic of cholera is caused by El Tor strains, which have completely replaced the sixth-pandemic classical strains of V. cholerae. To successfully establish infection and disseminate to new victims, V. cholerae relies on key virulence factors encoded on horizontally acquired genetic elements. The expression of these factors relies on the regulatory architecture that coordinates the timely expression of virulence determinants during host infection. Here, we apply transcriptomics and structural modeling to understand how type VI secretion system regulator A (TsrA) affects gene expression in both the classical and El Tor biotypes of V. cholerae. We find that TsrA acts as a negative regulator of V. cholerae virulence genes encoded on horizontally acquired genetic elements. The TsrA regulon comprises genes encoding cholera toxin (CT), the toxin-coregulated pilus (TCP), and the type VI secretion system (T6SS), as well as genes involved in biofilm formation. The majority of the TsrA regulon is carried on horizontally acquired AT-rich genetic islands whose loss or acquisition could be directly ascribed to the differences between the classical and El Tor strains studied. Our modeling predicts that the TsrA protein is a structural homolog of the histone-like nucleoid structuring protein (H-NS) oligomerization domain and is likely capable of forming higher-order superhelical structures, potentially with DNA. These findings describe how TsrA can integrate into the intricate V. cholerae virulence gene expression program, controlling gene expression through transcriptional silencing.
Collapse
|
14
|
Multi-tissue transcriptome analysis using hybrid-sequencing reveals potential genes and biological pathways associated with azadirachtin A biosynthesis in neem (azadirachta indica). BMC Genomics 2020; 21:749. [PMID: 33115410 PMCID: PMC7592523 DOI: 10.1186/s12864-020-07124-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 10/06/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Azadirachtin A is a triterpenoid from neem tree exhibiting excellent activities against over 600 insect species in agriculture. The production of azadirachtin A depends on extraction from neem tissues, which is not an eco-friendly and sustainable process. The low yield and discontinuous supply of azadirachtin A impedes further applications. The biosynthetic pathway of azadirachtin A is still unknown and is the focus of our study. RESULTS We attempted to explore azadirachtin A biosynthetic pathway and identified the key genes involved by analyzing transcriptome data from five neem tissues through the hybrid-sequencing (Illumina HiSeq and Pacific Biosciences Single Molecule Real-Time (SMRT)) approach. Candidates were first screened by comparing the expression levels between the five tissues. After phylogenetic analysis, domain prediction, and molecular docking studies, 22 candidates encoding 2,3-oxidosqualene cyclase (OSC), alcohol dehydrogenase, cytochrome P450 (CYP450), acyltransferase, and esterase were proposed to be potential genes involved in azadirachtin A biosynthesis. Among them, two unigenes encoding homologs of MaOSC1 and MaCYP71CD2 were identified. A unigene encoding the complete homolog of MaCYP71BQ5 was reported. Accuracy of the assembly was verified by quantitative real-time PCR (qRT-PCR) and full-length PCR cloning. CONCLUSIONS By integrating and analyzing transcriptome data from hybrid-seq technology, 22 differentially expressed genes (DEGs) were finally selected as candidates involved in azadirachtin A pathway. The obtained reliable and accurate sequencing data provided important novel information for understanding neem genome. Our data shed new light on understanding the biosynthesis of other triterpenoids in neem trees and provides a reference for exploring other valuable natural product biosynthesis in plants.
Collapse
|
15
|
Zhang M, Dang N, Ren D, Zhao F, Lv R, Ma T, Bao Q, Menghe B, Liu W. Comparison of Bacterial Microbiota in Raw Mare's Milk and Koumiss Using PacBio Single Molecule Real-Time Sequencing Technology. Front Microbiol 2020; 11:581610. [PMID: 33193214 PMCID: PMC7652796 DOI: 10.3389/fmicb.2020.581610] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 10/07/2020] [Indexed: 11/18/2022] Open
Abstract
Koumiss is a traditional fermented raw mare’s milk product. It contains high nutritional value and is well-known for its health-promoting effect as an alimentary supplement. This study aimed to investigate the bacterial diversity, especially lactic acid bacteria (LAB), in koumiss and raw mare’s milk. Forty-two samples, including koumiss and raw mare’s milk, were collected from the pastoral area in Yili, Kazakh Autonomous Prefecture, Xinjiang Uygur Autonomous Region in China. This work applied PacBio single-molecule real-time (SMRT) sequencing to profile full-length 16S rRNA genes, which was a powerful technology enabling bacterial taxonomic assignment to the species precision. The SMRT sequencing identified 12 phyla, 124 genera, and 227 species across 29 koumiss samples. Eighteen phyla, 286 genera, and 491 species were found across 13 raw mare’s milk samples. The bacterial microbiota diversity of the raw mare’s milk was more complex and diverse than the koumiss. Raw mare’s milk was rich in LAB, such as Lactobacillus (L.) helveticus, L. plantarum, Lactococcus (Lc.) lactis, and L. kefiranofaciens. In addition, raw mare’s milk also contained sequences representing pathogenic bacteria, such as Staphylococcus succinus, Acinetobacter lwoffii, Klebsiella (K.) oxytoca, and K. pneumoniae. The koumiss microbiota mainly comprised LAB, and sequences representing pathogenic bacteria were not detected. Meanwhile, the koumiss was enriched with secondary metabolic pathways that were potentially beneficial for health. Using a Random Forest model, the two kinds of samples could be distinguished with a high accuracy 95.2% [area under the curve (AUC) = 0.98] based on 42 species and functions. Comprehensive depiction of the microbiota in raw mare’s milk and koumiss might help elucidate evolutionary and functional relationships among the bacterial communities in these dairy products. The current work suffered from the limitation of a low sample size, so further work would be required to verify our findings.
Collapse
Affiliation(s)
- Meng Zhang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Na Dang
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Dongyan Ren
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Feiyan Zhao
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Ruirui Lv
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Teng Ma
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Qiuhua Bao
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Bilige Menghe
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Wenjun Liu
- Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, China.,Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
16
|
Garg S, Aach J, Li H, Sebenius I, Durbin R, Church G. A haplotype-aware de novo assembly of related individuals using pedigree sequence graph. Bioinformatics 2020; 36:2385-2392. [PMID: 31860070 DOI: 10.1093/bioinformatics/btz942] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 11/23/2019] [Accepted: 12/18/2019] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP) and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available. Current trio assembly approaches are not designed to incorporate long- and short-read data from mother-father-child trios, and therefore require relatively high coverages of costly long-read data to produce high-quality assemblies. Thus, building a trio-aware assembler capable of producing accurate and chromosomal-scale diploid genomes of all individuals in a pedigree, while being cost-effective in terms of sequencing costs, is a pressing need of the genomics community. RESULTS We present a novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences (PacBio) data from all related individuals, thereby generalizing our previous work on single individuals. We demonstrate the effectiveness of our pedigree approach on a simulated trio of pseudo-diploid yeast genomes with different heterozygosity rates, and real data from human chromosome. We show that we require as little as 30× coverage Illumina data and 15× PacBio data from each individual in a trio to generate chromosomal-scale phased assemblies. Additionally, we show that we can detect and phase variants from generated phased assemblies. AVAILABILITY AND IMPLEMENTATION https://github.com/shilpagarg/WHdenovo.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Genetics, Harvard Medical School.,Wyss Institute for Biologically Inspired Engineering, Harvard University
| | - John Aach
- Department of Genetics, Harvard Medical School
| | - Heng Li
- Department of Biomedical Informatics, Harvard Medical School, Boston
| | - Isaac Sebenius
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - George Church
- Department of Genetics, Harvard Medical School.,Wyss Institute for Biologically Inspired Engineering, Harvard University
| |
Collapse
|
17
|
Characterization of Mobile Genetic Elements Using Long-Read Sequencing for Tracking Listeria monocytogenes from Food Processing Environments. Pathogens 2020; 9:pathogens9100822. [PMID: 33036450 PMCID: PMC7599586 DOI: 10.3390/pathogens9100822] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 09/26/2020] [Accepted: 10/01/2020] [Indexed: 02/02/2023] Open
Abstract
Recently developed nanopore sequencing technologies offer a unique opportunity to rapidly close the genome and to identify complete sequences of mobile genetic elements (MGEs). In this study, 17 isolates of Listeria monocytogenes (Lm) epidemic clone II (ECII) from seven ready-to-eat meat or poultry processing facilities, not known to be associated with outbreaks, were shotgun sequenced, and among them, five isolates were further subjected to long-read sequencing. Additionally, 26 genomes of Lm ECII isolates associated with three listeriosis outbreaks in the U.S. and South Africa were obtained from the National Center for Biotechnology Information (NCBI) database and analyzed to evaluate if MGEs may be used as a high-resolution genetic marker for identifying and sourcing the origin of Lm. The analyses identified four comK prophages in 11 non-outbreak isolates from four facilities and three comK prophages in 20 isolates associated with two outbreaks that occurred in the U.S. In addition, three different plasmids were identified among 10 non-outbreak isolates and 14 outbreak isolates. Each comK prophage and plasmid was conserved among the isolates sharing it. Different prophages from different facilities or outbreaks had significant genetic variations, possibly due to horizontal gene transfer. Phylogenetic analysis showed that isolates from the same facility or the same outbreak always closely clustered. The time of most recent common ancestor of the Lm ECII isolates was estimated to be in March 1816 with the average nucleotide substitution rate of 3.1 × 10−7 substitutions per site per year. This study showed that complete MGE sequences provide a good signal to determine the genetic relatedness of Lm isolates, to identify persistence or repeated contamination that occurred within food processing environment, and to study the evolutionary history among closely related isolates.
Collapse
|
18
|
A survey on de novo assembly methods for single-molecular sequencing. QUANTITATIVE BIOLOGY 2020. [DOI: 10.1007/s40484-020-0214-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
19
|
Zhou Y, Zheng J, Wu Y, Zhang W, Jin J. A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes. BMC Genomics 2020; 21:183. [PMID: 32102653 PMCID: PMC7045542 DOI: 10.1186/s12864-020-6597-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2019] [Accepted: 02/19/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. RESULTS Here, we initially test a published approach called "genome-wide tetranucleotide frequency correlation coefficient" (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called "fragment tetranucleotide frequency correlation coefficient" (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. CONCLUSIONS FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.
Collapse
Affiliation(s)
- Yizhuang Zhou
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, People's Republic of China.
| | - Jifang Zheng
- Guangxi Key Laboratory of Tumor Immunology and Microenvironmental Regulation, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Yepeng Wu
- China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Wenting Zhang
- Guangxi Key Laboratory of Tumor Immunology and Microenvironmental Regulation, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Junfei Jin
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.
| |
Collapse
|
20
|
Qin M, Wu S, Li A, Zhao F, Feng H, Ding L, Ruan J. LRScaf: improving draft genomes using long noisy reads. BMC Genomics 2019; 20:955. [PMID: 31818249 PMCID: PMC6902338 DOI: 10.1186/s12864-019-6337-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 11/26/2019] [Indexed: 12/15/2022] Open
Abstract
Background The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. Results We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). Conclusions The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.
Collapse
Affiliation(s)
- Mao Qin
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China
| | - Shigang Wu
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China
| | - Alun Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China
| | - Fengli Zhao
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China
| | - Hu Feng
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China
| | - Lulu Ding
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China
| | - Jue Ruan
- Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 7, Pengfei Road, Dapeng District, Shenzhen, 518120, Guangdong, China.
| |
Collapse
|
21
|
Multiplexed Non-barcoded Long-Read Sequencing and Assembling Genomes of Bacillus Strains in Error-Free Simulations. Curr Microbiol 2019; 77:79-84. [PMID: 31722044 DOI: 10.1007/s00284-019-01808-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 11/02/2019] [Indexed: 10/25/2022]
Abstract
The generation of genomic data from microorganisms has revolutionized our abilities to understand their biology, but it is still challenging to obtain complete genome sequences of microbes in an automated high-throughput and cost-effective manner. While the advent of second-generation sequencing technologies provided significantly higher throughput, their shorter lengths and more pronounced sequence-context bias led to a shift towards resequencing applications. Recently, single molecule real-time (SMRT) DNA sequencing has been used to generate sequencing reads that are much longer than other sequencing platforms, facilitating de novo genome assembly and genome finishing. Here we introduced a novel multiplex strategy to make full use of the capacity and characteristics of SMRT sequencing in microbe genome assembly. We used error-free simulations to evaluate the practicability of assembling SMRT genomic sequencing data from multiple microbes into finished genomes once at a time. Then we compared the influence of two key factors, including sequencing coverage and read length, on multiplex assembling. Our results showed that long-read genomic sequencing inherently provided the ability to assemble genomic sequencing data from multiple microbes into finished genomes due to its long length. This approach might be helpful for the various groups of microbial genome projects or metagenomics research.
Collapse
|
22
|
Ang MY, Low TY, Lee PY, Wan Mohamad Nazarie WF, Guryev V, Jamal R. Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin Chim Acta 2019; 498:38-46. [DOI: 10.1016/j.cca.2019.08.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 12/14/2022]
|
23
|
Haghshenas E, Sahinalp SC, Hach F. lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data. Bioinformatics 2019; 35:20-27. [PMID: 30561550 DOI: 10.1093/bioinformatics/bty544] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 06/28/2018] [Indexed: 02/01/2023] Open
Abstract
Motivation Recent advances in genomics and precision medicine have been made possible through the application of high throughput sequencing (HTS) to large collections of human genomes. Although HTS technologies have proven their use in cataloging human genome variation, computational analysis of the data they generate is still far from being perfect. The main limitation of Illumina and other popular sequencing technologies is their short read length relative to the lengths of (common) genomic repeats. Newer (single molecule sequencing - SMS) technologies such as Pacific Biosciences and Oxford Nanopore are producing longer reads, making it theoretically possible to overcome the difficulties imposed by repeat regions. Unfortunately, because of their high sequencing error rate, reads generated by these technologies are very difficult to work with and cannot be used in many of the standard downstream analysis pipelines. Note that it is not only difficult to find the correct mapping locations of such reads in a reference genome, but also to establish their correct alignment so as to differentiate sequencing errors from real genomic variants. Furthermore, especially since newer SMS instruments provide higher throughput, mapping and alignment need to be performed much faster than before, maintaining high sensitivity. Results We introduce lordFAST, a novel long-read mapper that is specifically designed to align reads generated by PacBio and potentially other SMS technologies to a reference. lordFAST not only has higher sensitivity than the available alternatives, it is also among the fastest and has a very low memory footprint. Availability and implementation lordFAST is implemented in C++ and supports multi-threading. The source code of lordFAST is available at https://github.com/vpc-ccg/lordfast. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - S Cenk Sahinalp
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.,School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
24
|
Hubbard TP, Billings G, Dörr T, Sit B, Warr AR, Kuehl CJ, Kim M, Delgado F, Mekalanos JJ, Lewnard JA, Waldor MK. A live vaccine rapidly protects against cholera in an infant rabbit model. Sci Transl Med 2019; 10:10/445/eaap8423. [PMID: 29899024 DOI: 10.1126/scitranslmed.aap8423] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 03/26/2018] [Indexed: 12/17/2022]
Abstract
Outbreaks of cholera, a rapidly fatal diarrheal disease, often spread explosively. The efficacy of reactive vaccination campaigns-deploying Vibrio cholerae vaccines during epidemics-is partially limited by the time required for vaccine recipients to develop adaptive immunity. We created HaitiV, a live attenuated cholera vaccine candidate, by deleting diarrheagenic factors from a recent clinical isolate of V. cholerae and incorporating safeguards against vaccine reversion. We demonstrate that administration of HaitiV 24 hours before lethal challenge with wild-type V. cholerae reduced intestinal colonization by the wild-type strain, slowed disease progression, and reduced mortality in an infant rabbit model of cholera. HaitiV-mediated protection required viable vaccine, and rapid protection kinetics are not consistent with development of adaptive immunity. These features suggest that HaitiV mediates probiotic-like protection from cholera, a mechanism that is not known to be elicited by traditional vaccines. Mathematical modeling indicates that an intervention that works at the speed of HaitiV-mediated protection could improve the public health impact of reactive vaccination.
Collapse
Affiliation(s)
- Troy P Hubbard
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Gabriel Billings
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Tobias Dörr
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Brandon Sit
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Alyson R Warr
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Carole J Kuehl
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Minsik Kim
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Fernanda Delgado
- Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA.,Howard Hughes Medical Institute, Boston, MA 02115, USA
| | - John J Mekalanos
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Joseph A Lewnard
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew K Waldor
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA. .,Division of Infectious Diseases, Brigham and Women's Hospital, Boston, MA 02115, USA.,Howard Hughes Medical Institute, Boston, MA 02115, USA.,Department of Immunology and Infectious Disease, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
25
|
Varré JS, D'Agostino N, Touzet P, Gallina S, Tamburino R, Cantarella C, Ubrig E, Cardi T, Drouard L, Gualberto JM, Scotti N. Complete Sequence, Multichromosomal Architecture and Transcriptome Analysis of the Solanum tuberosum Mitochondrial Genome. Int J Mol Sci 2019; 20:E4788. [PMID: 31561566 PMCID: PMC6801519 DOI: 10.3390/ijms20194788] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 09/19/2019] [Accepted: 09/24/2019] [Indexed: 12/01/2022] Open
Abstract
Mitochondrial genomes (mitogenomes) in higher plants can induce cytoplasmic male sterility and be somehow involved in nuclear-cytoplasmic interactions affecting plant growth and agronomic performance. They are larger and more complex than in other eukaryotes, due to their recombinogenic nature. For most plants, the mitochondrial DNA (mtDNA) can be represented as a single circular chromosome, the so-called master molecule, which includes repeated sequences that recombine frequently, generating sub-genomic molecules in various proportions. Based on the relevance of the potato crop worldwide, herewith we report the complete mtDNA sequence of two S. tuberosum cultivars, namely Cicero and Désirée, and a comprehensive study of its expression, based on high-coverage RNA sequencing data. We found that the potato mitogenome has a multi-partite architecture, divided in at least three independent molecules that according to our data should behave as autonomous chromosomes. Inter-cultivar variability was null, while comparative analyses with other species of the Solanaceae family allowed the investigation of the evolutionary history of their mitogenomes. The RNA-seq data revealed peculiarities in transcriptional and post-transcriptional processing of mRNAs. These included co-transcription of genes with open reading frames that are probably expressed, methylation of an rRNA at a position that should impact translation efficiency and extensive RNA editing, with a high proportion of partial editing implying frequent mis-targeting by the editing machinery.
Collapse
Affiliation(s)
- Jean-Stéphane Varré
- Univ. Lille, CNRS, Centrale Lille, UMR 9189-CRIStAL-Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille, France.
| | - Nunzio D'Agostino
- CREA Research Centre for Vegetable and Ornamental Crops, 84098 Pontecagnano Faiano, SA, Italy.
| | - Pascal Touzet
- Univ. Lille, CNRS, UMR 8198-Evo-Eco-Paleo, F-59000 Lille, France.
| | - Sophie Gallina
- Univ. Lille, CNRS, UMR 8198-Evo-Eco-Paleo, F-59000 Lille, France.
| | - Rachele Tamburino
- CNR-IBBR, National Research Council of Italy, Institute of Biosciences and BioResources, 80055 Portici, NA, Italy.
| | - Concita Cantarella
- CREA Research Centre for Vegetable and Ornamental Crops, 84098 Pontecagnano Faiano, SA, Italy.
| | - Elodie Ubrig
- Institut de Biologie Moléculaire des Plantes-CNRS, Université de Strasbourg, Strasbourg 67084, France.
| | - Teodoro Cardi
- CREA Research Centre for Vegetable and Ornamental Crops, 84098 Pontecagnano Faiano, SA, Italy.
| | - Laurence Drouard
- Institut de Biologie Moléculaire des Plantes-CNRS, Université de Strasbourg, Strasbourg 67084, France.
| | - José Manuel Gualberto
- Institut de Biologie Moléculaire des Plantes-CNRS, Université de Strasbourg, Strasbourg 67084, France.
| | - Nunzia Scotti
- CNR-IBBR, National Research Council of Italy, Institute of Biosciences and BioResources, 80055 Portici, NA, Italy.
| |
Collapse
|
26
|
Gargis AS, Cherney B, Conley AB, McLaughlin HP, Sue D. Rapid Detection of Genetic Engineering, Structural Variation, and Antimicrobial Resistance Markers in Bacterial Biothreat Pathogens by Nanopore Sequencing. Sci Rep 2019; 9:13501. [PMID: 31534162 PMCID: PMC6751186 DOI: 10.1038/s41598-019-49700-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 08/27/2019] [Indexed: 01/10/2023] Open
Abstract
Widespread release of Bacillus anthracis (anthrax) or Yersinia pestis (plague) would prompt a public health emergency. During an exposure event, high-quality whole genome sequencing (WGS) can identify genetic engineering, including the introduction of antimicrobial resistance (AMR) genes. Here, we developed rapid WGS laboratory and bioinformatics workflows using a long-read nanopore sequencer (MinION) for Y. pestis (6.5 h) and B. anthracis (8.5 h) and sequenced strains with different AMR profiles. Both salt-precipitation and silica-membrane extracted DNA were suitable for MinION WGS using both rapid and field library preparation methods. In replicate experiments, nanopore quality metrics were defined for genome assembly and mutation analysis. AMR markers were correctly detected and >99% coverage of chromosomes and plasmids was achieved using 100,000 raw sequencing reads. While chromosomes and large and small plasmids were accurately assembled, including novel multimeric forms of the Y. pestis virulence plasmid, pPCP1, MinION reads were error-prone, particularly in homopolymer regions. MinION sequencing holds promise as a practical, front-line strategy for on-site pathogen characterization to speed the public health response during a biothreat emergency.
Collapse
Affiliation(s)
- Amy S Gargis
- Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
- Division of Healthcare Quality Promotion, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
| | - Blake Cherney
- Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Andrew B Conley
- IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, Georgia, USA
| | - Heather P McLaughlin
- Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - David Sue
- Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| |
Collapse
|
27
|
Vassaux A, Meunier L, Vandenbol M, Baurain D, Fickers P, Jacques P, Leclère V. Nonribosomal peptides in fungal cell factories: from genome mining to optimized heterologous production. Biotechnol Adv 2019; 37:107449. [PMID: 31518630 DOI: 10.1016/j.biotechadv.2019.107449] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 09/06/2019] [Accepted: 09/09/2019] [Indexed: 12/15/2022]
Abstract
Fungi are notoriously prolific producers of secondary metabolites including nonribosomal peptides (NRPs). The structural complexity of NRPs grants them interesting activities such as antibiotic, anti-cancer, and anti-inflammatory properties. The discovery of these compounds with attractive activities can be achieved by using two approaches: either by screening samples originating from various environments for their biological activities, or by identifying the related clusters in genomic sequences thanks to bioinformatics tools. This genome mining approach has grown tremendously due to recent advances in genome sequencing, which have provided an incredible amount of genomic data from hundreds of microbial species. Regarding fungal organisms, the genomic data have revealed the presence of an unexpected number of putative NRP-related gene clusters. This highlights fungi as a goldmine for the discovery of putative novel bioactive compounds. Recent development of NRP dedicated bioinformatics tools have increased the capacity to identify these gene clusters and to deduce NRPs structures, speeding-up the screening process for novel metabolites discovery. Unfortunately, the newly identified compound is frequently not or poorly produced by native producers due to a lack of expression of the related genes cluster. A frequently employed strategy to increase production rates consists in transferring the related biosynthetic pathway in heterologous hosts. This review aims to provide a comprehensive overview about the topic of NRPs discovery, from gene cluster identification by genome mining to the heterologous production in fungal hosts. The main computational tools and methods for genome mining are herein presented with an emphasis on the particularities of the fungal systems. The different steps of the reconstitution of NRP biosynthetic pathway in heterologous fungal cell factories will be discussed, as well as the key factors to consider for maximizing productivity. Several examples will be developed to illustrate the potential of heterologous production to both discover uncharacterized novel compounds predicted in silico by genome mining, and to enhance the productivity of interesting bio-active natural products.
Collapse
Affiliation(s)
- Antoine Vassaux
- TERRA Teaching and Research Centre, Microbial Processes and Interactions, Gembloux Agro-Bio Tech, University of Liege, Avenue de la Faculté d'Agronomie, B5030 Gembloux, Belgium; Univ. Lille, INRA, ISA, Univ. Artois, Univ. Littoral Côte d'Opale, EA 7394-ICV-Institut Charles Viollette, F-59000 Lille, France
| | - Loïc Meunier
- TERRA Teaching and Research Centre, Microbial Processes and Interactions, Gembloux Agro-Bio Tech, University of Liege, Avenue de la Faculté d'Agronomie, B5030 Gembloux, Belgium; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liege, Boulevard du Rectorat 27, B-4000 Liège, Belgium
| | - Micheline Vandenbol
- TERRA Teaching and Research Centre, Microbiologie et Génomique, Gembloux Agro-Bio Tech, University of Liege, Avenue de la Faculté d'Agronomie, B5030 Gembloux, Belgium
| | - Denis Baurain
- InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liege, Boulevard du Rectorat 27, B-4000 Liège, Belgium
| | - Patrick Fickers
- TERRA Teaching and Research Centre, Microbial Processes and Interactions, Gembloux Agro-Bio Tech, University of Liege, Avenue de la Faculté d'Agronomie, B5030 Gembloux, Belgium
| | - Philippe Jacques
- TERRA Teaching and Research Centre, Microbial Processes and Interactions, Gembloux Agro-Bio Tech, University of Liege, Avenue de la Faculté d'Agronomie, B5030 Gembloux, Belgium
| | - Valérie Leclère
- Univ. Lille, INRA, ISA, Univ. Artois, Univ. Littoral Côte d'Opale, EA 7394-ICV-Institut Charles Viollette, F-59000 Lille, France.
| |
Collapse
|
28
|
Mukherjee S, Cai Z, Mukherjee A, Longkumer I, Mech M, Vupru K, Khate K, Rajkhowa C, Mitra A, Guldbrandtsen B, Lund MS, Sahana G. Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis). BMC Genomics 2019; 20:617. [PMID: 31357931 PMCID: PMC6664528 DOI: 10.1186/s12864-019-5980-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 07/16/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n = 58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun. RESULTS We generated ≈300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1 Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle. CONCLUSION Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.
Collapse
Affiliation(s)
- Sabyasachi Mukherjee
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Zexi Cai
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Anupama Mukherjee
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
- Present address: Dairy Cattle Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana 132001 India
| | - Imsusosang Longkumer
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Moonmoon Mech
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Kezhavituo Vupru
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Kobu Khate
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Chandan Rajkhowa
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Abhijit Mitra
- Animal Genetics and Breeding Lab., ICAR-National Research Centre on Mithun, Medziphema, Nagaland 797106 India
| | - Bernt Guldbrandtsen
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830 Tjele, Denmark
| |
Collapse
|
29
|
Fletcher K, Gil J, Bertier LD, Kenefick A, Wood KJ, Zhang L, Reyes-Chin-Wo S, Cavanaugh K, Tsuchida C, Wong J, Michelmore R. Genomic signatures of heterokaryosis in the oomycete pathogen Bremia lactucae. Nat Commun 2019; 10:2645. [PMID: 31201315 PMCID: PMC6570648 DOI: 10.1038/s41467-019-10550-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 05/14/2019] [Indexed: 12/26/2022] Open
Abstract
Lettuce downy mildew caused by Bremia lactucae is the most important disease of lettuce globally. This oomycete is highly variable and rapidly overcomes resistance genes and fungicides. The use of multiple read types results in a high-quality, near-chromosome-scale, consensus assembly. Flow cytometry plus resequencing of 30 field isolates, 37 sexual offspring, and 19 asexual derivatives from single multinucleate sporangia demonstrates a high incidence of heterokaryosis in B. lactucae. Heterokaryosis has phenotypic consequences on fitness that may include an increased sporulation rate and qualitative differences in virulence. Therefore, selection should be considered as acting on a population of nuclei within coenocytic mycelia. This provides evolutionary flexibility to the pathogen enabling rapid adaptation to different repertoires of host resistance genes and other challenges. The advantages of asexual persistence of heterokaryons may have been one of the drivers of selection that resulted in the loss of uninucleate zoospores in multiple downy mildews.
Collapse
Affiliation(s)
- Kyle Fletcher
- Genome Center, University of California, Davis, CA, 95616, USA
| | - Juliana Gil
- Genome Center, University of California, Davis, CA, 95616, USA
- Plant Pathology Graduate Group, University of California, Davis, CA, 95616, USA
| | - Lien D Bertier
- Genome Center, University of California, Davis, CA, 95616, USA
| | - Aubrey Kenefick
- Genome Center, University of California, Davis, CA, 95616, USA
| | - Kelsey J Wood
- Genome Center, University of California, Davis, CA, 95616, USA
- Integrated Genetics and Genomics Graduate Group, University of California, Davis, CA, 95616, USA
| | - Lin Zhang
- Genome Center, University of California, Davis, CA, 95616, USA
| | - Sebastian Reyes-Chin-Wo
- Genome Center, University of California, Davis, CA, 95616, USA
- Integrated Genetics and Genomics Graduate Group, University of California, Davis, CA, 95616, USA
- Bayer Crop Science, 37437 CA-16, Woodland, CA, 95695, USA
| | - Keri Cavanaugh
- Genome Center, University of California, Davis, CA, 95616, USA
| | - Cayla Tsuchida
- Genome Center, University of California, Davis, CA, 95616, USA
- Plant Pathology Graduate Group, University of California, Davis, CA, 95616, USA
- Arcadia Biosciences, Davis, CA, 95616, USA
| | - Joan Wong
- Genome Center, University of California, Davis, CA, 95616, USA
- Plant Biology Graduate Group, University of California, Davis, CA, 95616, USA
- Pacific Biosciences of California, Inc., Menlo Park, CA, 94025, USA
| | - Richard Michelmore
- Genome Center, University of California, Davis, CA, 95616, USA.
- Departments of Plant Sciences, Molecular and Cellular Biology, Medical Microbiology and Immunology, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
30
|
|
31
|
Jiang JB, Quattrini AM, Francis WR, Ryan JF, Rodríguez E, McFadden CS. A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome. Gigascience 2019; 8:giz026. [PMID: 30942866 PMCID: PMC6446218 DOI: 10.1093/gigascience/giz026] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 01/15/2019] [Accepted: 02/28/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND More than 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral "forests," which provide unique niches and 3-dimensional living space for other organisms. The octocoral genus Renilla inhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans. Renilla is especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several anthozoan genomes are currently available, the majority of these are hexacorals. Here, we present a de novo assembly of an azooxanthellate shallow-water octocoral, Renilla muelleri. FINDINGS We generated a hybrid de novo assembly using MaSuRCA v.3.2.6. The final assembly included 4,825 scaffolds and a haploid genome size of 172 megabases (Mb). A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustus ab initio gene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone, Nematostella vectensis, or to the Uniprot database. Although the R. muelleri genome may be smaller (172 Mb minimum size) than other publicly available coral genomes (256-448 Mb), the R. muelleri genome is similar to other coral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models. CONCLUSIONS The R. muelleri hybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity.
Collapse
Affiliation(s)
- Justin B Jiang
- Department of Biology, Harvey Mudd College, 1250 N. Dartmouth Ave., Claremont, CA 91711, USA
| | - Andrea M Quattrini
- Department of Biology, Harvey Mudd College, 1250 N. Dartmouth Ave., Claremont, CA 91711, USA
| | - Warren R Francis
- Department of Biology, University of Southern Denmark, Campusvej 55, Odense M 5230, Denmark
| | - Joseph F Ryan
- Whitney Laboratory for Marine Bioscience, University of Florida, 9505 Ocean Shore Blvd., St. Augustine, FL 32080, USA
| | - Estefanía Rodríguez
- Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024, USA
| | - Catherine S McFadden
- Department of Biology, Harvey Mudd College, 1250 N. Dartmouth Ave., Claremont, CA 91711, USA
| |
Collapse
|
32
|
Complete Genome Sequence and Comparative Analysis of Synechococcus sp. CS-601 (SynAce01), a Cold-Adapted Cyanobacterium from an Oligotrophic Antarctic Habitat. Int J Mol Sci 2019; 20:ijms20010152. [PMID: 30609821 PMCID: PMC6337551 DOI: 10.3390/ijms20010152] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 12/24/2022] Open
Abstract
Marine picocyanobacteria belonging to Synechococcus are major contributors to the global carbon cycle, however the genomic information of its cold-adapted members has been lacking to date. To fill this void the genome of a cold-adapted planktonic cyanobacterium Synechococcus sp. CS-601 (SynAce01) has been sequenced. The genome of the strain contains a single chromosome of approximately 2.75 MBp and GC content of 63.92%. Gene prediction yielded 2984 protein coding sequences and 44 tRNA genes. The genome contained evidence of horizontal gene transfer events during its evolution. CS-601 appears as a transport generalist with some specific adaptation to an oligotrophic marine environment. It has a broad repertoire of transporters of both inorganic and organic nutrients to survive in inhospitable environments. The cold adaptation of the strain exhibited characteristics of a psychrotroph rather than psychrophile. Its salt adaptation strategy is likely to rely on the uptake and synthesis of osmolytes, like glycerol or glycine betaine. Overall, the genome reveals two distinct patterns of adaptation to the inhospitable environment of Antarctica. Adaptation to an oligotrophic marine environment is likely due to an abundance of genes, probably acquired horizontally, that are associated with increased transport of nutrients, osmolytes, and light harvesting. On the other hand, adaptations to low temperatures are likely due to prolonged evolutionary changes.
Collapse
|
33
|
Ma ZS, Li L, Ye C, Peng M, Zhang YP. Hybrid assembly of ultra-long Nanopore reads augmented with 10x-Genomics contigs: Demonstrated with a human genome. Genomics 2018; 111:1896-1901. [PMID: 30594583 DOI: 10.1016/j.ygeno.2018.12.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 11/17/2018] [Accepted: 12/24/2018] [Indexed: 10/27/2022]
Abstract
The 3rd generation of sequencing (3GS) technologies generate ultra-long reads (up to 1 Mb), which makes it possible to eliminate gaps and effectively resolve repeats in genome assembly. However, the 3GS technologies suffer from the high base-level error rates (15%-40%) and high sequencing costs. To address these issues, the hybrid assembly strategy, which utilizes both 3GS reads and inexpensive NGS (next generation sequencing) short reads, was invented. Here, we use 10×-Genomics® technology, which integrates a novel bar-coding strategy with Illumina® NGS with an advantage of revealing long-range sequence information, to replace common NGS short reads for hybrid assembly of long erroneous 3GS reads. We demonstrate the feasibility of integrating the 3GS with 10×-Genomics technologies for a new strategy of hybrid de novo genome assembly by utilizing DBG2OLC and Sparc software packages, previously developed by the authors for regular hybrid assembly. Using a human genome as an example, we show that with only 7× coverage of ultra-long Nanopore® reads, augmented with 10× reads, our approach achieved nearly the same level of quality, compared with non-hybrid assembly with 35× coverage of Nanopore reads. Compared with the assembly with 10×-Genomics reads alone, our assembly is gapless with slightly high cost. These results suggest that our new hybrid assembly with ultra-long 3GS reads augmented with 10×-Genomics reads offers a low-cost (less than ¼ the cost of the non-hybrid assembly) and computationally light-weighted (only took 109 calendar hours with peak memory-usage = 61GB on a dual-CPU office workstation) solution for extending the wide applications of the 3GS technologies.
Collapse
Affiliation(s)
- Zhanshan Sam Ma
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China.
| | - Lianwei Li
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China
| | - Chengxi Ye
- Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Minsheng Peng
- Molecular Evolution and Genome Diversity Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China; KIZ/CUHK Joint Laboratory of Bio-resources and Molecular Research in Common Diseases, Kunming 650223, China
| | - Ya-Ping Zhang
- Molecular Evolution and Genome Diversity Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, Chinese Academy of Sciences, Kunming, 650223, China; KIZ/CUHK Joint Laboratory of Bio-resources and Molecular Research in Common Diseases, Kunming 650223, China.
| |
Collapse
|
34
|
Yu J, Wang F, Zhan X, Wang X, Zuo F, Wei Y, Qi J, Liu Y. Improvement and evaluation of loop-mediated isothermal amplification combined with a chromatographic flow dipstick assay and utilization in detection of Vibrio cholerae. Anal Bioanal Chem 2018; 411:647-658. [PMID: 30506503 DOI: 10.1007/s00216-018-1472-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/25/2018] [Accepted: 11/02/2018] [Indexed: 01/27/2023]
Abstract
Loop-mediated isothermal amplification (LAMP) is a specific, sensitive, and easy-to-perform nucleic acid analytical technique with wide application for diagnosis of disease. Recently, LAMP combined with use of a lateral chromatographic flow dipstick (LFD) has been widely used in nucleic acid detection. However, the LFD mechanism has not been systematically analyzed, and the optimal combination of labeled primers has not been adequately evaluated. We analyzed the LAMP mechanism and discovered that the labeled loop primers played a significant role in the LFD assay. To verify our hypothesis, we developed two LFD assays for Vibrio cholerae to detect the ctxA gene and the 16S-23S ribosomal DNA internal transcribed spacer (ITS). We labeled the inner primers [forward inner primer (FIP) and backward inner primer (BIP)] and loop primers [forward loop primer (LF) and backward loop primer (LB)]. Then the labeled and unlabeled primers were combined to form ten different primer sets. We assessed the specificity, sensitivity, and efficiency of LFD assays with use of different primer compositions. All triple-labeled primer sets resulted in false positive results in the LFD assay, as did the FIP and BIP double-labeled primer set. Other double-labeled-primer sets used in LFD assays showed higher sensitivity than the LAMP assays. Moreover, FIP and LF double-labeled and BIP and LB double-labeled sets had the highest sensitivity. In both cases, assays could be performed in 20 min. We also applied the ITS LFD assays in food samples. The enrichment broths of 112 oyster samples were tested, and the proportion that tested positive by the LFD assays was 6.25%, which was not lower than the rate for the conventional PCR method (5.36%). Graphical abstract ᅟ.
Collapse
Affiliation(s)
- Jia Yu
- College of Life Sciences, Qingdao University, Qingdao, 266071, Shandong, China
| | - Feixue Wang
- School of Medicine, Nankai University, No. 94 Weijin Road, Nankai District, Tianjin, 300071, China
| | - Xijing Zhan
- Tianjin International Travel Health Care Center, Tianjin, 300456, China
| | - Xin Wang
- Tianjin International Travel Health Care Center, Tianjin, 300456, China
| | - Feng Zuo
- Tianjin International Travel Health Care Center, Tianjin, 300456, China
| | - Yuxi Wei
- College of Life Sciences, Qingdao University, Qingdao, 266071, Shandong, China
| | - Jun Qi
- Tianjin International Travel Health Care Center, Tianjin, 300456, China.
| | - Yin Liu
- School of Medicine, Nankai University, No. 94 Weijin Road, Nankai District, Tianjin, 300071, China.
| |
Collapse
|
35
|
Sohn JI, Nam JW. The present and future of de novo whole-genome assembly. Brief Bioinform 2018; 19:23-40. [PMID: 27742661 DOI: 10.1093/bib/bbw096] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Indexed: 12/15/2022] Open
Abstract
As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome.
Collapse
|
36
|
Grohmann A, Vainshtein Y, Euchner E, Grumaz C, Bryniok D, Rabus R, Sohn K. Genetic repertoires of anaerobic microbiomes driving generation of biogas. BIOTECHNOLOGY FOR BIOFUELS 2018; 11:255. [PMID: 30250507 PMCID: PMC6146632 DOI: 10.1186/s13068-018-1258-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 09/11/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND Biogas production is an attractive technology for a sustainable generation of renewable energy. Although the microbial community is fundamental for such production, the process control is still limited to technological and chemical parameters. Currently, most of the efforts on microbial management system (MiMaS) are focused on process-specific marker species and community dynamics, but a practical implementation is in its infancy. The high number of unknown and uncharacterized microorganisms in general is one of the reasons hindering further advancements. RESULTS A Biogas Metagenomics Hybrid Assembly (BioMETHA) database, derived from microbiomes of biogas plants, was generated using a dedicated assembly strategy for different metagenomic datasets. Long reads from nanopore sequencing (MinION) were combined with short, more accurate second-generation sequencing reads (Illumina). The hybrid assembly resulted in 231 genomic bins each representing a taxonomic unit with an average completeness of 47%. Functional annotation identified 13,190 non-redundant genes covering roughly 207 k coding sequences. Mapping rates of metagenomics DNA derived from diverse biogas plants and laboratory reactors increased up to 73%. In addition, an EC (enzyme commission) reference sequence collection (ERSC) was generated whose genes are crucial for biogas-related processes, consisting of 235 unique EC numbers organized in 52 metabolic modules. Mapping rates of metatranscriptomic data to this ERSC revealed coverages of up to 93%. Process parameters and imbalances of laboratory reactors could be reconstructed by evaluating abundance of biogas-specific metabolic modules using metatranscriptomic data derived from various fermenter systems. CONCLUSION This newly established metagenomic hybrid assembly in combination with an EC reference sequence collection might help to shed light on the microbial dark matter of biogas plants by contributing to the development of a reference for biogas plant microbiome-specific gene sequences. Considering a biogas microbiome as a complex meta-organism expressing a meta-transcriptome, the approach established here could lay the foundation for a function-based microbial management system.
Collapse
Affiliation(s)
- Anja Grohmann
- University of Stuttgart IGVP, Pfaffenwaldring 31, 70569 Stuttgart, Germany
| | | | - Ellen Euchner
- University of Applied Science Hamm-Lippstadt, Marker Allee 76–78, 59063 Hamm, Germany
| | | | - Dieter Bryniok
- University of Applied Science Hamm-Lippstadt, Marker Allee 76–78, 59063 Hamm, Germany
| | - Ralf Rabus
- Institute for Chemistry and Biology of the Marine Environment (ICBM), University of Oldenburg, Carl-von-Ossietzky-Strasse 9-11, 26111 Oldenburg, Germany
| | - Kai Sohn
- Fraunhofer IGB, Nobelstrasse 12, 70569 Stuttgart, Germany
| |
Collapse
|
37
|
Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018; 6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. RESULTS We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models. CONCLUSIONS In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Faculty of Information Studies, 8000 Novo Mesto, Slovenia
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| |
Collapse
|
38
|
Garg S, Rautiainen M, Novak AM, Garrison E, Durbin R, Marschall T. A graph-based approach to diploid genome assembly. Bioinformatics 2018; 34:i105-i114. [PMID: 29949989 PMCID: PMC6022571 DOI: 10.1093/bioinformatics/bty279] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community. Results We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants. Availability and implementation https://github.com/whatshap/whatshap. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shilpa Garg
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, Germany
- Department of Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
| |
Collapse
|
39
|
Price MN, Wetmore KM, Waters RJ, Callaghan M, Ray J, Liu H, Kuehl JV, Melnyk RA, Lamson JS, Suh Y, Carlson HK, Esquivel Z, Sadeeshkumar H, Chakraborty R, Zane GM, Rubin BE, Wall JD, Visel A, Bristow J, Blow MJ, Arkin AP, Deutschbauer AM. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 2018; 557:503-509. [PMID: 29769716 DOI: 10.1038/s41586-018-0124-0] [Citation(s) in RCA: 287] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 04/09/2018] [Indexed: 01/25/2023]
Abstract
One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because they are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.
Collapse
Affiliation(s)
- Morgan N Price
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Kelly M Wetmore
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - R Jordan Waters
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Mark Callaghan
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jayashree Ray
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Hualan Liu
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jennifer V Kuehl
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ryan A Melnyk
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Jacob S Lamson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Yumi Suh
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Hans K Carlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Zuelma Esquivel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Harini Sadeeshkumar
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Romy Chakraborty
- Climate and Ecosystem Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Grant M Zane
- Department of Biochemistry, University of Missouri, Columbia, MO, USA
| | - Benjamin E Rubin
- Division of Biological Sciences, University of California, San Diego, CA, USA
| | - Judy D Wall
- Department of Biochemistry, University of Missouri, Columbia, MO, USA
| | - Axel Visel
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California, Merced, CA, USA
| | - James Bristow
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J Blow
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Adam P Arkin
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,Department of Bioengineering, University of California, Berkeley, CA, USA.
| | - Adam M Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA.
| |
Collapse
|
40
|
Li C, Lin F, An D, Wang W, Huang R. Genome Sequencing and Assembly by Long Reads in Plants. Genes (Basel) 2017; 9:E6. [PMID: 29283420 PMCID: PMC5793159 DOI: 10.3390/genes9010006] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 12/18/2017] [Accepted: 12/18/2017] [Indexed: 11/17/2022] Open
Abstract
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists' projects.
Collapse
Affiliation(s)
- Changsheng Li
- College of Agronomy, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| | - Feng Lin
- College of Bioscience and Biotechnology, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| | - Dong An
- School of Agriculture and Biology, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China.
| | - Wenqin Wang
- School of Agriculture and Biology, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai 200240, China.
| | - Ruidong Huang
- College of Agronomy, Shenyang Agricultural University, 120 Dongling Road, Shenyang 110866, China.
| |
Collapse
|
41
|
Aganezov SS, Alekseyev MA. CAMSA: a tool for comparative analysis and merging of scaffold assemblies. BMC Bioinformatics 2017; 18:496. [PMID: 29244014 PMCID: PMC5731503 DOI: 10.1186/s12859-017-1919-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose positions and orientations along the genome are unknown. While there exists a number of methods for reconstruction of the genome from its scaffolds, utilizing various computational and wet-lab techniques, they often can produce only partial error-prone scaffold assemblies. It therefore becomes important to compare and merge scaffold assemblies produced by different methods, thus combining their advantages and highlighting present conflicts for further investigation. These tasks may be labor intensive if performed manually. RESULTS We present CAMSA-a tool for comparative analysis and merging of two or more given scaffold assemblies. The tool (i) creates an extensive report with several comparative quality metrics; (ii) constructs the most confident merged scaffold assembly; and (iii) provides an interactive framework for a visual comparative analysis of the given assemblies. Among the CAMSA features, only scaffold merging can be evaluated in comparison to existing methods. Namely, it resembles the functionality of assembly reconciliation tools, although their primary targets are somewhat different. Our evaluations show that CAMSA produces merged assemblies of comparable or better quality than existing assembly reconciliation tools while being the fastest in terms of the total running time. CONCLUSIONS CAMSA addresses the current deficiency of tools for automated comparison and analysis of multiple assemblies of the same set scaffolds. Since there exist numerous methods and techniques for scaffold assembly, identifying similarities and dissimilarities across assemblies produced by different methods is beneficial both for the developers of scaffold assembly algorithms and for the researchers focused on improving draft assemblies of specific organisms.
Collapse
Affiliation(s)
- Sergey S Aganezov
- Princeton University, 35 Olden St., Princeton, 08450, NJ, USA. .,ITMO University, 49 Kronverksky Pr., St. Petersburg, 197101, Russia.
| | - Max A Alekseyev
- The George Washington University, 45085 University Dr., Suite 305, Ashburn, 20147, VA, USA
| |
Collapse
|
42
|
Del Cortona A, Leliaert F, Bogaert KA, Turmel M, Boedeker C, Janouškovec J, Lopez-Bautista JM, Verbruggen H, Vandepoele K, De Clerck O. The Plastid Genome in Cladophorales Green Algae Is Encoded by Hairpin Chromosomes. Curr Biol 2017; 27:3771-3782.e6. [PMID: 29199074 DOI: 10.1016/j.cub.2017.11.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 10/30/2017] [Accepted: 11/01/2017] [Indexed: 12/28/2022]
Abstract
Virtually all plastid (chloroplast) genomes are circular double-stranded DNA molecules, typically between 100 and 200 kb in size and encoding circa 80-250 genes. Exceptions to this universal plastid genome architecture are very few and include the dinoflagellates, where genes are located on DNA minicircles. Here we report on the highly deviant chloroplast genome of Cladophorales green algae, which is entirely fragmented into hairpin chromosomes. Short- and long-read high-throughput sequencing of DNA and RNA demonstrated that the chloroplast genes of Boodlea composita are encoded on 1- to 7-kb DNA contigs with an exceptionally high GC content, each containing a long inverted repeat with one or two protein-coding genes and conserved non-coding regions putatively involved in replication and/or expression. We propose that these contigs correspond to linear single-stranded DNA molecules that fold onto themselves to form hairpin chromosomes. The Boodlea chloroplast genes are highly divergent from their corresponding orthologs, and display an alternative genetic code. The origin of this highly deviant chloroplast genome most likely occurred before the emergence of the Cladophorales, and coincided with an elevated transfer of chloroplast genes to the nucleus. A chloroplast genome that is composed only of linear DNA molecules is unprecedented among eukaryotes, and highlights unexpected variation in plastid genome architecture.
Collapse
Affiliation(s)
- Andrea Del Cortona
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000 Ghent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Zwijnaarde, Belgium; VIB Center for Plant Systems Biology, Technologiepark 927, 9052 Zwijnaarde, Belgium; Bioinformatics Institute Ghent, Ghent University, Technologiepark 927, 9052 Zwijnaarde, Belgium
| | - Frederik Leliaert
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000 Ghent, Belgium; Botanic Garden Meise, Nieuwelaan 38, 1860 Meise, Belgium
| | - Kenny A Bogaert
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000 Ghent, Belgium
| | - Monique Turmel
- Institut de Biologie Intégrative et des Systèmes, Département de Biochimie, de Microbiologie et de Bio-informatique, Université Laval, Pavillon Charles-Eugène-Marchand 1030, Avenue de la Médecine, Québec City, QC G1V 0A6, Canada
| | - Christian Boedeker
- School of Biological Sciences, Victoria University of Wellington, New Kirk Building, Kelburn Parade, P.O. Box 600, Wellington 6012, New Zealand
| | - Jan Janouškovec
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Juan M Lopez-Bautista
- Department of Biological Sciences, The University of Alabama, 300 Hackberry Lane, Tuscaloosa, AL 35484-0345, USA
| | - Heroen Verbruggen
- School of BioSciences, University of Melbourne, Professors Walk, Melbourne, VIC 3010, Australia
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, 9052 Zwijnaarde, Belgium; VIB Center for Plant Systems Biology, Technologiepark 927, 9052 Zwijnaarde, Belgium; Bioinformatics Institute Ghent, Ghent University, Technologiepark 927, 9052 Zwijnaarde, Belgium
| | - Olivier De Clerck
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000 Ghent, Belgium.
| |
Collapse
|
43
|
Glycolytic Functions Are Conserved in the Genome of the Wine Yeast Hanseniaspora uvarum, and Pyruvate Kinase Limits Its Capacity for Alcoholic Fermentation. Appl Environ Microbiol 2017; 83:AEM.01580-17. [PMID: 28887422 DOI: 10.1128/aem.01580-17] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 09/03/2017] [Indexed: 01/11/2023] Open
Abstract
Hanseniaspora uvarum (anamorph Kloeckera apiculata) is a predominant yeast on wine grapes and other fruits and has a strong influence on wine quality, even when Saccharomyces cerevisiae starter cultures are employed. In this work, we sequenced and annotated approximately 93% of the H. uvarum genome. Southern and synteny analyses were employed to construct a map of the seven chromosomes present in a type strain. Comparative determinations of specific enzyme activities within the fermentative pathway in H. uvarum and S. cerevisiae indicated that the reduced capacity of the former yeast for ethanol production is caused primarily by an ∼10-fold-lower activity of the key glycolytic enzyme pyruvate kinase. The heterologous expression of the encoding gene, H. uvarumPYK1 (HuPYK1), and two genes encoding the phosphofructokinase subunits, HuPFK1 and HuPFK2, in the respective deletion mutants of S. cerevisiae confirmed their functional homology.IMPORTANCEHanseniaspora uvarum is a predominant yeast species on grapes and other fruits. It contributes significantly to the production of desired as well as unfavorable aroma compounds and thus determines the quality of the final product, especially wine. Despite this obvious importance, knowledge on its genetics is scarce. As a basis for targeted metabolic modifications, here we provide the results of a genomic sequencing approach, including the annotation of 3,010 protein-encoding genes, e.g., those encoding the entire sugar fermentation pathway, key components of stress response signaling pathways, and enzymes catalyzing the production of aroma compounds. Comparative analyses suggest that the low fermentative capacity of H. uvarum compared to that of Saccharomyces cerevisiae can be attributed to low pyruvate kinase activity. The data reported here are expected to aid in establishing H. uvarum as a non-Saccharomyces yeast in starter cultures for wine and cider fermentations.
Collapse
|
44
|
Zaccaron AZ, Woloshuk CP, Bluhm BH. Comparative genomics of maize ear rot pathogens reveals expansion of carbohydrate-active enzymes and secondary metabolism backbone genes in Stenocarpella maydis. Fungal Biol 2017; 121:966-983. [PMID: 29029703 DOI: 10.1016/j.funbio.2017.08.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 08/15/2017] [Accepted: 08/18/2017] [Indexed: 12/11/2022]
Abstract
Stenocarpella maydis is a plant pathogenic fungus that causes Diplodia ear rot, one of the most destructive diseases of maize. To date, little information is available regarding the molecular basis of pathogenesis in this organism, in part due to limited genomic resources. In this study, a 54.8 Mb draft genome assembly of S. maydis was obtained with Illumina and PacBio sequencing technologies, and analyzed. Comparative genomic analyses with the predominant maize ear rot pathogens Aspergillus flavus, Fusarium verticillioides, and Fusarium graminearum revealed an expanded set of carbohydrate-active enzymes for cellulose and hemicellulose degradation in S. maydis. Analyses of predicted genes involved in starch degradation revealed six putative α-amylases, four extracellular and two intracellular, and two putative γ-amylases, one of which appears to have been acquired from bacteria via horizontal transfer. Additionally, 87 backbone genes involved in secondary metabolism were identified, which represents one of the largest known assemblages among Pezizomycotina species. Numerous secondary metabolite gene clusters were identified, including two clusters likely involved in the biosynthesis of diplodiatoxin and chaetoglobosins. The draft genome of S. maydis presented here will serve as a useful resource for molecular genetics, functional genomics, and analyses of population diversity in this organism.
Collapse
Affiliation(s)
- Alex Z Zaccaron
- Department of Plant Pathology, University of Arkansas, Division of Agriculture, Fayetteville, AR 72701, USA
| | - Charles P Woloshuk
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, USA
| | - Burton H Bluhm
- Department of Plant Pathology, University of Arkansas, Division of Agriculture, Fayetteville, AR 72701, USA.
| |
Collapse
|
45
|
Optimal hybrid sequencing and assembly: Feasibility conditions for accurate genome reconstruction and cost minimization strategy. Comput Biol Chem 2017; 69:153-163. [DOI: 10.1016/j.compbiolchem.2017.03.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 03/30/2017] [Indexed: 01/10/2023]
|
46
|
Haghshenas E, Hach F, Sahinalp SC, Chauve C. CoLoRMap: Correcting Long Reads by Mapping short reads. Bioinformatics 2017; 32:i545-i551. [PMID: 27587673 DOI: 10.1093/bioinformatics/btw463] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
MOTIVATION Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads. RESULTS We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods. AVAILABILITY AND IMPLEMENTATION The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap CONTACT ehaghshe@sfu.ca or cedric.chauve@sfu.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsan Haghshenas
- School of Computing Sciences MADD-Gen Graduate Program, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Faraz Hach
- School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - S Cenk Sahinalp
- School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada, School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
47
|
Utturkar SM, Klingeman DM, Hurt RA, Brown SD. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies. Front Microbiol 2017; 8:1272. [PMID: 28769883 PMCID: PMC5513972 DOI: 10.3389/fmicb.2017.01272] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 06/26/2017] [Indexed: 11/20/2022] Open
Abstract
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Collapse
Affiliation(s)
- Sagar M Utturkar
- Graduate School of Genome Science and Technology, University of TennesseeKnoxville, TN, United States
| | - Dawn M Klingeman
- Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, United States.,BioEnergy Science CenterOak Ridge, TN, United States
| | - Richard A Hurt
- Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, United States
| | - Steven D Brown
- Graduate School of Genome Science and Technology, University of TennesseeKnoxville, TN, United States.,Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, United States.,BioEnergy Science CenterOak Ridge, TN, United States
| |
Collapse
|
48
|
Kremer FS, McBride AJA, Pinto LDS. Approaches for in silico finishing of microbial genome sequences. Genet Mol Biol 2017; 40:553-576. [PMID: 28898352 PMCID: PMC5596377 DOI: 10.1590/1678-4685-gmb-2016-0230] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/13/2017] [Indexed: 12/15/2022] Open
Abstract
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Collapse
Affiliation(s)
- Frederico Schmitt Kremer
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Alan John Alexander McBride
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| | - Luciano da Silva Pinto
- Programa de Pós-Graduação em Biotecnologia (PPGB), Centro de
Desenvolvimento Tecnológico, Universidade Federal de Pelotas, Pelotas, Brazil
| |
Collapse
|
49
|
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations. Mar Drugs 2017; 15:md15060165. [PMID: 28587290 PMCID: PMC5484115 DOI: 10.3390/md15060165] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Revised: 05/22/2017] [Accepted: 05/31/2017] [Indexed: 02/06/2023] Open
Abstract
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC.
Collapse
|
50
|
Genomic sequencing of a strain of Acinetobacter baumannii and potential mechanisms to antibiotics resistance. INFECTION GENETICS AND EVOLUTION 2017; 50:20-24. [DOI: 10.1016/j.meegid.2017.02.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Revised: 01/02/2017] [Accepted: 02/01/2017] [Indexed: 12/26/2022]
|