1
|
Si Y, Li H, Li X. Difference Analysis Among Six Kinds of Acceptor Splicing Sequences by the Dispersion Features of 6-mer Subsets in Human Genes. BIOLOGY 2025; 14:206. [PMID: 40001974 PMCID: PMC11853274 DOI: 10.3390/biology14020206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 02/07/2025] [Accepted: 02/13/2025] [Indexed: 02/27/2025]
Abstract
Identifying the sequence composition of different splicing modes is a challenge in current research. This study explored the dispersion distributions of 6-mer subsets in human acceptor splicing regions. Without differentiating acceptor splicing modes, obvious differences were observed across the upstream, core, and downstream regions of splicing sites for 16 dispersion distributions. These findings indicate that the dispersion value of each subset can effectively characterize the compositional properties of splicing sequences. When acceptor splicing sequences were classified into common, constitutive, and alternative modes, the differences in dispersion distributions for most of the XY1 6-mer subsets were significant among the three splicing modes. Furthermore, the alternative splicing mode was classified into normal, exonic, and intronic sub-modes, the differences in dispersion distributions for most of the XY1 6-mer subsets were also significant among the three splicing sub-modes. Our results indicate that dispersion values of XY1 6-mer subsets not only revealed the sequence composition patterns of acceptor splicing regions but also effectively identified the differences in base correlation among various acceptor splicing modes. Our research provides new insights into revealing and predicting different splicing modes.
Collapse
Affiliation(s)
| | - Hong Li
- Inner Mongolia Autonomous Region Key Laboratory of Biophysics and Bioinformatics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; (Y.S.); (X.L.)
| | | |
Collapse
|
2
|
Tang L, Xu D, Luo L, Ma W, He X, Diao Y, Ke R, Kapranov P. A novel human protein-coding locus identified using a targeted RNA enrichment technique. BMC Biol 2024; 22:273. [PMID: 39593153 PMCID: PMC11590353 DOI: 10.1186/s12915-024-02069-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 11/12/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND Accurate and comprehensive genomic annotation, including the full list of protein-coding genes, is vital for understanding the molecular mechanisms of human biology. We have previously shown that the genome contains a multitude of yet hidden functional exons and transcripts, some of which might represent novel mRNAs. These results resonate with those from other groups and strongly argue that two decades after the completion of the first draft of the human genome sequence, the current annotation of human genes and transcripts remains far from being complete. RESULTS Using a targeted RNA enrichment technique, we showed that one of the novel functional exons previously discovered by us and currently annotated as part of a long non-coding RNA, is actually a part of a novel protein-coding gene, InSETG-4, which encodes a novel human protein with no known homologs or motifs. We found that InSETG-4 is induced by various DNA-damaging agents across multiple cell types and therefore might represent a novel component of DNA damage response. Despite its low abundance in bulk cell populations, InSETG-4 exhibited expression restricted to a small fraction of cells, as demonstrated by the amplification-based single-molecule fluorescence in situ hybridization (asmFISH) analysis. CONCLUSIONS This study argues that yet undiscovered human protein-coding genes exist and provides an example of how targeted RNA enrichment techniques can help to fill this major gap in our knowledge of the information encoded in the human genome.
Collapse
Affiliation(s)
- Lu Tang
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Dongyang Xu
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China.
| | - Lingcong Luo
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Weiyan Ma
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Xiaojie He
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Yong Diao
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Rongqin Ke
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China.
| | - Philipp Kapranov
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, 361102, China.
| |
Collapse
|
3
|
Wang L, Shi P, Ping Z, Huang Q, Jiang L, Ma N, Wang Q, Xu J, Zou Y, Huang Z. The golden genome annotation of Ganoderma lingzhi reveals a more complex scenario of eukaryotic gene structure and transcription activity. BMC Biol 2024; 22:271. [PMID: 39587587 PMCID: PMC11590231 DOI: 10.1186/s12915-024-02073-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Accepted: 11/18/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND It is generally accepted that nuclear genes in eukaryotes are located independently on chromosomes and expressed in a monocistronic manner. However, accumulating evidence suggests a more complex landscape of gene structure and transcription. Ganoderma lingzhi, a model medicinal fungus, currently lacks high-quality genome annotation, hindering genetic studies. RESULTS Here, we reported a golden annotation of G. lingzhi, featuring 14,147 high-confidence genes derived from extensive manual corrections. Novel characteristics of gene structure and transcription were identified accordingly. Notably, non-canonical splicing sites accounted for 1.99% of the whole genome, with the predominant types being GC-AG (1.85%), GT-AC (0.05%), and GT-GG (0.04%). 1165 pairs of genes were found to have overlapped transcribed regions, and 92.19% of which showed opposite directions of gene transcription. A total of 5,412,158 genetic variations were identified among 13 G. lingzhi strains, and the manually corrected gene sets resulted in enhanced functional annotation of these variations. More than 60% of G. lingzhi genes were alternatively spliced. In addition, we found that two or more protein-coding genes (PCGs) can be transcribed into a single RNA molecule, referred to as polycistronic genes. In total, 1272 polycistronic genes associated with 2815 PCGs were identified. CONCLUSIONS The widespread presence of polycistronic genes in G. lingzhi strongly complements the theory that polycistron is also present in eukaryotic genomes. The extraordinary gene structure and transcriptional activity uncovered through this golden annotation provide implications for the study of genes, genomes, and related studies in G. lingzhi and other eukaryotes.
Collapse
Affiliation(s)
- Lining Wang
- Guangdong Engineering Laboratory of Biomass Value-added Utilization, Guangdong Engineering Research & Development Center for Comprehensive Utilization of Plant Fiber, Guangzhou Key Laboratory for Comprehensive Utilization of Plant Fiber, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, 510316, China
| | - Peiqi Shi
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou, 510120, China
| | - Zhaohua Ping
- Guangdong Engineering Laboratory of Biomass Value-added Utilization, Guangdong Engineering Research & Development Center for Comprehensive Utilization of Plant Fiber, Guangzhou Key Laboratory for Comprehensive Utilization of Plant Fiber, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, 510316, China
| | - Qinghua Huang
- Guangdong Engineering Laboratory of Biomass Value-added Utilization, Guangdong Engineering Research & Development Center for Comprehensive Utilization of Plant Fiber, Guangzhou Key Laboratory for Comprehensive Utilization of Plant Fiber, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, 510316, China
| | - Liqun Jiang
- Guangdong Engineering Laboratory of Biomass Value-added Utilization, Guangdong Engineering Research & Development Center for Comprehensive Utilization of Plant Fiber, Guangzhou Key Laboratory for Comprehensive Utilization of Plant Fiber, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, 510316, China
| | - Nianfang Ma
- Guangdong Engineering Laboratory of Biomass Value-added Utilization, Guangdong Engineering Research & Development Center for Comprehensive Utilization of Plant Fiber, Guangzhou Key Laboratory for Comprehensive Utilization of Plant Fiber, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, 510316, China
| | - Qingfu Wang
- Guangdong Engineering Laboratory of Biomass Value-added Utilization, Guangdong Engineering Research & Development Center for Comprehensive Utilization of Plant Fiber, Guangzhou Key Laboratory for Comprehensive Utilization of Plant Fiber, Institute of Biological and Medical Engineering, Guangdong Academy of Sciences, Guangzhou, 510316, China.
| | - Jiang Xu
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
| | - Yajie Zou
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Zhihai Huang
- The Second Clinical College, Guangzhou University of Chinese Medicine, Guangzhou, 510120, China.
| |
Collapse
|
4
|
Lyu X, Li P, Jin L, Yang F, Pucker B, Wang C, Liu L, Zhao M, Shi L, Zhang Y, Yang Q, Xu K, Li X, Hu Z, Yang J, Yu J, Zhang M. Tracing the evolutionary and genetic footprints of atmospheric tillandsioids transition from land to air. Nat Commun 2024; 15:9599. [PMID: 39505856 PMCID: PMC11541568 DOI: 10.1038/s41467-024-53756-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 10/22/2024] [Indexed: 11/08/2024] Open
Abstract
Plant evolution is driven by key innovations of functional traits that enables their survivals in diverse ecological environments. However, plant adaptive evolution from land to atmospheric niches remains poorly understood. In this study, we use the epiphytic Tillandsioideae subfamily of Bromeliaceae as model plants to explore their origin, evolution and diversification. We provide a comprehensive phylogenetic tree based on nuclear transcriptomic sequences, indicating that core tillandsioids originated approximately 11.3 million years ago in the Andes. The geological uplift of the Andes drives the divergence of tillandsioids into tank-forming and atmospheric types. Our genomic and transcriptomic analyses reveal gene variations and losses associated with adaptive traits such as impounding tanks and absorptive trichomes. Furthermore, we uncover specific nitrogen-fixing bacterial communities in the phyllosphere of tillandsioids as potential source of nitrogen acquisition. Collectively, our study provides integrative multi-omics insights into the adaptive evolution of tillandsioids in response to elevated aerial habitats.
Collapse
Affiliation(s)
- Xiaolong Lyu
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Ping Li
- Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Liang Jin
- Zhejiang Institute of Landscape Plants and Flowers, Zhejiang Academy of Agricultural Sciences, Hangzhou, 311251, China
| | - Feng Yang
- BGI Research, Sanya, 572025, China
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, Guangdong, China
| | - Boas Pucker
- Institute of Plant Biology, TU Braunschweig, Mendelssohnstraße 4, Braunschweig, 38106, Germany
| | - Chenhao Wang
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Linye Liu
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Meng Zhao
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Lu Shi
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Yutong Zhang
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Qinrong Yang
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Kuangtian Xu
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Xiao Li
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
| | - Zhongyuan Hu
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
- Hainan Institute of Zhejiang University, Sanya, 572025, China
| | - Jinghua Yang
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China
- Hainan Institute of Zhejiang University, Sanya, 572025, China
| | - Jingquan Yu
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China.
- Hainan Institute of Zhejiang University, Sanya, 572025, China.
| | - Mingfang Zhang
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310058, China.
- Hainan Institute of Zhejiang University, Sanya, 572025, China.
| |
Collapse
|
5
|
Liu X, Zhang H, Zeng Y, Zhu X, Zhu L, Fu J. DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks. Genes (Basel) 2024; 15:404. [PMID: 38674339 PMCID: PMC11048956 DOI: 10.3390/genes15040404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/28/2024] Open
Abstract
The precise identification of splice sites is essential for unraveling the structure and function of genes, constituting a pivotal step in the gene annotation process. In this study, we developed a novel deep learning model, DRANetSplicer, that integrates residual learning and attention mechanisms for enhanced accuracy in capturing the intricate features of splice sites. We constructed multiple datasets using the most recent versions of genomic data from three different organisms, Oryza sativa japonica, Arabidopsis thaliana and Homo sapiens. This approach allows us to train models with a richer set of high-quality data. DRANetSplicer outperformed benchmark methods on donor and acceptor splice site datasets, achieving an average accuracy of (96.57%, 95.82%) across the three organisms. Comparative analyses with benchmark methods, including SpliceFinder, Splice2Deep, Deep Splicer, EnsembleSplice, and DNABERT, revealed DRANetSplicer's superior predictive performance, resulting in at least a (4.2%, 11.6%) relative reduction in average error rate. We utilized the DRANetSplicer model trained on O. sativa japonica data to predict splice sites in A. thaliana, achieving accuracies for donor and acceptor sites of (94.89%, 94.25%). These results indicate that DRANetSplicer possesses excellent cross-organism predictive capabilities, with its performance in cross-organism predictions even surpassing that of benchmark methods in non-cross-organism predictions. Cross-organism validation showcased DRANetSplicer's excellence in predicting splice sites across similar organisms, supporting its applicability in gene annotation for understudied organisms. We employed multiple methods to visualize the decision-making process of the model. The visualization results indicate that DRANetSplicer can learn and interpret well-known biological features, further validating its overall performance. Our study systematically examined and confirmed the predictive ability of DRANetSplicer from various levels and perspectives, indicating that its practical application in gene annotation is justified.
Collapse
Affiliation(s)
- Xueyan Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Hongyan Zhang
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Ying Zeng
- School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, China;
| | - Xinghui Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Lei Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Jiahui Fu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| |
Collapse
|
6
|
Larue GE, Roy SW. Where the minor things are: a pan-eukaryotic survey suggests neutral processes may explain much of minor intron evolution. Nucleic Acids Res 2023; 51:10884-10908. [PMID: 37819006 PMCID: PMC10639083 DOI: 10.1093/nar/gkad797] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 09/12/2023] [Accepted: 09/19/2023] [Indexed: 10/13/2023] Open
Abstract
Spliceosomal introns are gene segments removed from RNA transcripts by ribonucleoprotein machineries called spliceosomes. In some eukaryotes a second 'minor' spliceosome is responsible for processing a tiny minority of introns. Despite its seemingly modest role, minor splicing has persisted for roughly 1.5 billion years of eukaryotic evolution. Identifying minor introns in over 3000 eukaryotic genomes, we report diverse evolutionary histories including surprisingly high numbers in some fungi and green algae, repeated loss, as well as general biases in their positional and genic distributions. We estimate that ancestral minor intron densities were comparable to those of vertebrates, suggesting a trend of long-term stasis. Finally, three findings suggest a major role for neutral processes in minor intron evolution. First, highly similar patterns of minor and major intron evolution contrast with both functionalist and deleterious model predictions. Second, observed functional biases among minor intron-containing genes are largely explained by these genes' greater ages. Third, no association of intron splicing with cell proliferation in a minor intron-rich fungus suggests that regulatory roles are lineage-specific and thus cannot offer a general explanation for minor splicing's persistence. These data constitute the most comprehensive view of minor introns and their evolutionary history to date, and provide a foundation for future studies of these remarkable genetic elements.
Collapse
Affiliation(s)
- Graham E Larue
- Quantitative and Systems Biology Graduate Program, University of California Merced, Merced, CA 95343, USA
| | - Scott W Roy
- Department of Molecular and Cell Biology, University of California Merced, Merced, CA 95343, USA
- Department of Biology, San Francisco State University, San Francisco, CA 94132, USA
| |
Collapse
|
7
|
Seabolt MH, Roellig DM, Konstantinidis KT. Spliceosomal introns in the diplomonad parasite Giardia duodenalis revisited. Microb Genom 2023; 9. [PMID: 37934076 DOI: 10.1099/mgen.0.001117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023] Open
Abstract
Complete reference genomes, including correct feature annotations, are a fundamental aspect of genomic biology. In the case of protozoan species such as Giardia duodenalis, a major human and animal parasite worldwide, accurate genome annotation can deepen our understanding of the evolution of parasitism and pathogenicity by identifying genes underlying key traits and clinically relevant cellular mechanisms, and by extension, the development of improved prevention strategies and treatments. This study used bioinformatics analyses of Giardia mRNA libraries to characterize known introns and identify new intron candidates, working towards completion of the G. duodenalis assemblage A strain 'WB' genome and further elucidating Giardia's gene expression. By using a set of experimentally validated positive control loci to calibrate our intron detection pipeline, we were able to detect evidence of previously missed candidate splice junctions directly from expressed transcript data. These intron candidates were further studied in silico using NMDS (non-metric multidimensional scaling) clustering to determine shared characteristics and their relative importance such as secondary structure, splicing efficiency and motif conservation, and thus to refine intron models. Results from this study identified 34 new intron candidates, with several potential introns showing evidence that secondary structure of the mRNA molecule might play a more significant role in splicing than previously reported eukaryotic splicing activity mediated by a reduced spliceosome present in G. duodenalis.
Collapse
Affiliation(s)
- Matthew H Seabolt
- Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Leidos Inc., Reston, VA 20190, USA
| | - Dawn M Roellig
- Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30329, USA
| | - Konstantinos T Konstantinidis
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
8
|
Ingvardsen CR, Massange-Sánchez JA, Borum F, Füchtbauer WS, Bagge M, Knudsen S, Gregersen PL. Highly effective mlo-based powdery mildew resistance in hexaploid wheat without pleiotropic effects. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2023; 335:111785. [PMID: 37419327 DOI: 10.1016/j.plantsci.2023.111785] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 06/07/2023] [Accepted: 07/02/2023] [Indexed: 07/09/2023]
Abstract
Application of the mlo-based resistance in barley against powdery mildew attacks is a major success in crop breeding, since it confers durable disease resistance. Resistance caused by mutations in the Mlo gene seems to be ubiquitous across a range of species. This work addresses the introduction of mlo-based resistance into hexaploid wheat, which is complicated by the occurrence of three homoeologous genes: Mlo-A1, Mlo-B1 and Mlo-D1. EMS-generated mutant plants were screened for mutations in the three homoeologues. We selected and combined 6, 8, and 4 mutations, respectively, to obtain triple homozygous mlo mutant lines. Twenty-four mutant lines showed highly effective resistance towards attack by the powdery mildew pathogen under field conditions. All 18 mutations appeared to contribute to resistance; however, they had different effects on the occurrence of symptoms such as chlorotic and necrotic spots, which are pleiotropic to the mlo-based powdery mildew resistance. We conclude that to obtain highly effective powdery mildew resistance in wheat and to avoid detrimental pleiotropic effects, all three Mlo homoeologues should be mutated; however, at least one of the mutations should be of the weaker type in order to alleviate strong pleiotropic effects from the other mutations.
Collapse
Affiliation(s)
| | - Julio A Massange-Sánchez
- Department of Agroecology, AU-Flakkebjerg, Aarhus University, DK-4200 Slagelse, Denmark; Centro de Investigación y Asistencia en Tecnología y Diseño del Estado de Jalisco A.C. (CIATEJ), Unidad de Biotecnología Vegetal, 44270 Guadalajara, Mexico
| | - Finn Borum
- Sejet Plant Breeding, Noerremarksvej 67, DK-8700 Horsens, Denmark
| | | | - Merethe Bagge
- Sejet Plant Breeding, Noerremarksvej 67, DK-8700 Horsens, Denmark; DANESPO, Dyrskuevej 15, DK-7323 Give, Denmark
| | - Søren Knudsen
- Carlsberg Research Laboratory, J.C. Jacobsens Gade 4, DK-1799 Copenhagen V, Denmark
| | - Per L Gregersen
- Department of Agroecology, AU-Flakkebjerg, Aarhus University, DK-4200 Slagelse, Denmark.
| |
Collapse
|
9
|
Cheng W, Hong C, Zeng F, Liu N, Gao H. Sequence variations affect the 5' splice site selection of plant introns. PLANT PHYSIOLOGY 2023; 193:1281-1296. [PMID: 37394939 DOI: 10.1093/plphys/kiad375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 05/31/2023] [Accepted: 06/04/2023] [Indexed: 07/04/2023]
Abstract
Introns are noncoding sequences spliced out of pre-mRNAs by the spliceosome to produce mature mRNAs. The 5' ends of introns mostly begin with GU and have a conserved sequence motif of AG/GUAAGU that could base-pair with the core sequence of U1 snRNA of the spliceosome. Intriguingly, ∼ 1% of introns in various eukaryotic species begin with GC. This occurrence could cause misannotation of genes; however, the underlying splicing mechanism is unclear. We analyzed the sequences around the intron 5' splice site (ss) in Arabidopsis (Arabidopsis thaliana) and found sequences at the GC intron ss are much more stringent than those of GT introns. Mutational analysis at various positions of the intron 5' ss revealed that although mutations impair base pairing, different mutations at the same site can have different effects, suggesting that steric hindrance also affects splicing. Moreover, mutations of 5' ss often activate a hidden ss nearby. Our data suggest that the 5' ss is selected via a competition between the major ss and the nearby minor ss. This work not only provides insights into the splicing mechanism of intron 5' ss but also improves the accuracy of gene annotation and the study of the evolution of intron 5' ss.
Collapse
Affiliation(s)
- Wenzhen Cheng
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Conghao Hong
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Fang Zeng
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Nan Liu
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| | - Hongbo Gao
- National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, China
| |
Collapse
|
10
|
Schilbert HM, Holzenkamp K, Viehöver P, Holtgräwe D, Möllers C. Homoeologous non-reciprocal translocation explains a major QTL for seed lignin content in oilseed rape (Brassica napus L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2023; 136:172. [PMID: 37439815 PMCID: PMC10345078 DOI: 10.1007/s00122-023-04407-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/22/2023] [Indexed: 07/14/2023]
Abstract
A homoeologous non-reciprocal translocation was identified in the major QTL for seed lignin content in the low lignin line SGDH14. The lignin biosynthetic gene PAL4 was deleted. Oilseed rape is a major oil crop and a valuable protein source for animal and human nutrition. Lignin is a non-digestible, major component of the seed coat with negative effect on sensory quality, bioavailability and usage of oilseed rape's protein. Hence, seed lignin reduction is of economic and nutritional importance. In this study, the major QTL for reduced lignin content found on chromosome C05 in the DH population SGDH14 x Express 617 was further examined. SGDH14 had lower seed lignin content than Express 617. Harvested seeds from a F2 population of the same cross were additionally field tested and used for seed quality analysis. The F2 population showed a bimodal distribution for seed lignin content. F2 plants with low lignin content had thinner seed coats compared to high lignin lines. Both groups showed a dark seed colour with a slightly lighter colour in the low lignin group indicating that a low lignin content is not necessarily associated with yellow seed colour. Mapping of genomic long-reads from SGDH14 against the Express 617 genome assembly revealed a homoeologous non-reciprocal translocation (HNRT) in the confidence interval of the major QTL for lignin content. A homologous A05 region is duplicated and replaced the C05 region in SGDH14. As consequence several genes located in the C05 region were lost in SGDH14. Thus, a HNRT was identified in the major QTL region for reduced lignin content in the low lignin line SGDH14. The most promising candidate gene related to lignin biosynthesis on C05, PAL4, was deleted.
Collapse
Affiliation(s)
- Hanna Marie Schilbert
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, Bielefeld, Germany.
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| | - Karin Holzenkamp
- Department of Crop Sciences, Division of Crop Plant Genetics, Georg-August-University, Göttingen, Germany
| | - Prisca Viehöver
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Daniela Holtgräwe
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Christian Möllers
- Department of Crop Sciences, Division of Crop Plant Genetics, Georg-August-University, Göttingen, Germany
| |
Collapse
|
11
|
Bartas M, Volna A, Cerven J, Pucker B. Identification of annotation artifacts concerning the chalcone synthase (CHS). BMC Res Notes 2023; 16:109. [PMID: 37340477 DOI: 10.1186/s13104-023-06386-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/13/2023] [Indexed: 06/22/2023] Open
Abstract
OBJECTIVE Chalcone synthase (CHS) catalyzes the initial step of the flavonoid biosynthesis. The CHS encoding gene is well studied in numerous plant species. Rapidly growing sequence databases contain hundreds of CHS entries that are the result of automatic annotation. In this study, we evaluated apparent multiplication of CHS domains in CHS gene models of four plant species. MAIN FINDINGS CHS genes with an apparent triplication of the CHS domain encoding part were discovered through database searches. Such genes were found in Macadamia integrifolia, Musa balbisiana, Musa troglodytarum, and Nymphaea colorata. A manual inspection of the CHS gene models in these four species with massive RNA-seq data suggests that these gene models are the result of artificial fusions in the annotation process. While there are hundreds of seemingly correct CHS records in the databases, it is not clear why these annotation artifacts appeared.
Collapse
Affiliation(s)
- Martin Bartas
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Adriana Volna
- Department of Physics, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Jiri Cerven
- Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Boas Pucker
- Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany.
| |
Collapse
|
12
|
Luo C, Yan J, Liu W, Xu Y, Sun P, Wang M, Xie D, Jiang B. Genetic mapping and genome-wide association study identify BhYAB4 as the candidate gene regulating seed shape in wax gourd ( Benincasa hispida). FRONTIERS IN PLANT SCIENCE 2022; 13:961864. [PMID: 36161030 PMCID: PMC9493316 DOI: 10.3389/fpls.2022.961864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 07/06/2022] [Indexed: 06/16/2023]
Abstract
Wax gourd is an important vegetable crop of the Cucurbitaceae family. According to the shape and structure of the seed coat, the seeds of the wax gourd can be divided into bilateral and unilateral. Bilateral seeds usually germinate quickly and have a high germination rate than unilateral seeds. Thereby, wax gourd varieties with bilateral seeds are more welcomed by seed companies and growers. However, the genetic basis and molecular mechanism regulating seed shape remain unclear in the wax gourd. In this study, the genetic analysis demonstrated that the seed shape of wax gourd was controlled by a single gene, with bilateral dominant to unilateral. Combined with genetic mapping and genome-wide association study, Bhi04G000544 (BhYAB4), encoding a YABBY transcription factor, was identified as the candidate gene for seed shape determination in the wax gourd. A G/A single nucleotide polymorphism variation of BhYAB4 was detected among different germplasm resources, with BhYAB4G specifically enriched in bilateral seeds and BhYAB4A in unilateral seeds. The G to A mutation caused intron retention and premature stop codon of BhYAB4. Expression analysis showed that both BhYAB4G and BhYAB4A were highly expressed in seeds, while the nuclear localization of BhYAB4A protein was disturbed compared with that of BhYAB4G protein. Finally, a derived cleaved amplified polymorphic sequence marker that could efficiently distinguish between bilateral and unilateral seeds was developed, thereby facilitating the molecular marker-assisted breeding of wax gourd cultivars.
Collapse
Affiliation(s)
- Chen Luo
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| | - Jinqiang Yan
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| | - Wenrui Liu
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| | - Yuanchao Xu
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Piaoyun Sun
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| | - Min Wang
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| | - Dasen Xie
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| | - Biao Jiang
- Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
- Guangdong Key Laboratory for New Technology Research of Vegetables, Guangzhou, China
| |
Collapse
|
13
|
Schilbert HM, Pucker B, Ries D, Viehöver P, Micic Z, Dreyer F, Beckmann K, Wittkop B, Weisshaar B, Holtgräwe D. Mapping‑by‑Sequencing Reveals Genomic Regions Associated with Seed Quality Parameters in Brassica napus. Genes (Basel) 2022; 13:genes13071131. [PMID: 35885914 PMCID: PMC9317104 DOI: 10.3390/genes13071131] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 11/21/2022] Open
Abstract
Rapeseed (Brassica napus L.) is an important oil crop and has the potential to serve as a highly productive source of protein. This protein exhibits an excellent amino acid composition and has high nutritional value for humans. Seed protein content (SPC) and seed oil content (SOC) are two complex quantitative and polygenic traits which are negatively correlated and assumed to be controlled by additive and epistatic effects. A reduction in seed glucosinolate (GSL) content is desired as GSLs cause a stringent and bitter taste. The goal here was the identification of genomic intervals relevant for seed GSL content and SPC/SOC. Mapping by sequencing (MBS) revealed 30 and 15 new and known genomic intervals associated with seed GSL content and SPC/SOC, respectively. Within these intervals, we identified known but also so far unknown putatively causal genes and sequence variants. A 4 bp insertion in the MYB28 homolog on C09 shows a significant association with a reduction in seed GSL content. This study provides insights into the genetic architecture and potential mechanisms underlying seed quality traits, which will enhance future breeding approaches in B. napus.
Collapse
Affiliation(s)
- Hanna Marie Schilbert
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany; (H.M.S.); (B.P.); (D.R.); (P.V.); (B.W.)
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Boas Pucker
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany; (H.M.S.); (B.P.); (D.R.); (P.V.); (B.W.)
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Mendelssohnstraße 4, 38106 Braunschweig, Germany
| | - David Ries
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany; (H.M.S.); (B.P.); (D.R.); (P.V.); (B.W.)
| | - Prisca Viehöver
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany; (H.M.S.); (B.P.); (D.R.); (P.V.); (B.W.)
| | - Zeljko Micic
- Deutsche Saatveredelung AG, Weissenburger Straße 5, 59557 Lippstadt, Germany;
| | - Felix Dreyer
- NPZ Innovation GmbH, Hohenlieth-Hof 1, 24363 Holtsee, Germany; (F.D.); (K.B.)
| | - Katrin Beckmann
- NPZ Innovation GmbH, Hohenlieth-Hof 1, 24363 Holtsee, Germany; (F.D.); (K.B.)
| | - Benjamin Wittkop
- Department of Plant Breeding, Justus Liebig University, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany;
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany; (H.M.S.); (B.P.); (D.R.); (P.V.); (B.W.)
| | - Daniela Holtgräwe
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany; (H.M.S.); (B.P.); (D.R.); (P.V.); (B.W.)
- Correspondence:
| |
Collapse
|
14
|
Ballmann R, Hotop SK, Bertoglio F, Steinke S, Heine PA, Chaudhry MZ, Jahn D, Pucker B, Baldanti F, Piralla A, Schubert M, Čičin-Šain L, Brönstrup M, Hust M, Dübel S. ORFeome Phage Display Reveals a Major Immunogenic Epitope on the S2 Subdomain of SARS-CoV-2 Spike Protein. Viruses 2022; 14:1326. [PMID: 35746797 PMCID: PMC9229677 DOI: 10.3390/v14061326] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/14/2022] [Accepted: 06/15/2022] [Indexed: 02/01/2023] Open
Abstract
The development of antibody therapies against SARS-CoV-2 remains a challenging task during the ongoing COVID-19 pandemic. All approved therapeutic antibodies are directed against the receptor binding domain (RBD) of the spike, and therefore lose neutralization efficacy against emerging SARS-CoV-2 variants, which frequently mutate in the RBD region. Previously, phage display has been used to identify epitopes of antibody responses against several diseases. Such epitopes have been applied to design vaccines or neutralize antibodies. Here, we constructed an ORFeome phage display library for the SARS-CoV-2 genome. Open reading frames (ORFs) representing the SARS-CoV-2 genome were displayed on the surface of phage particles in order to identify enriched immunogenic epitopes from COVID-19 patients. Library quality was assessed by both NGS and epitope mapping of a monoclonal antibody with a known binding site. The most prominent epitope captured represented parts of the fusion peptide (FP) of the spike. It is associated with the cell entry mechanism of SARS-CoV-2 into the host cell; the serine protease TMPRSS2 cleaves the spike within this sequence. Blocking this mechanism could be a potential target for non-RBD binding therapeutic anti-SARS-CoV-2 antibodies. As mutations within the FP amino acid sequence have been rather rare among SARS-CoV-2 variants so far, this may provide an advantage in the fight against future virus variants.
Collapse
Affiliation(s)
- Rico Ballmann
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| | - Sven-Kevin Hotop
- Helmholtz Centre for Infection Research, Inhoffenstr. 7, 38124 Braunschweig, Germany; (S.-K.H.); (M.Z.C.); (L.Č.-Š.); (M.B.)
| | - Federico Bertoglio
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| | - Stephan Steinke
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| | - Philip Alexander Heine
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| | - M. Zeeshan Chaudhry
- Helmholtz Centre for Infection Research, Inhoffenstr. 7, 38124 Braunschweig, Germany; (S.-K.H.); (M.Z.C.); (L.Č.-Š.); (M.B.)
| | - Dieter Jahn
- Institut für Mikrobiologie, Technische Universität Braunschweig, Spielmannstr. 7, 38106 Braunschweig, Germany;
| | - Boas Pucker
- Institute of Plant Biology, Technische Universität Braunschweig, Humboldtstr 1, 38106 Braunschweig, Germany;
| | - Fausto Baldanti
- Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, 27100 Pavia, Italy;
- Molecular Virology Unit, Microbiology and Virology Department, IRCCS Fondazione Policlinico, 27100 Pavia, Italy;
| | - Antonio Piralla
- Molecular Virology Unit, Microbiology and Virology Department, IRCCS Fondazione Policlinico, 27100 Pavia, Italy;
| | - Maren Schubert
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| | - Luka Čičin-Šain
- Helmholtz Centre for Infection Research, Inhoffenstr. 7, 38124 Braunschweig, Germany; (S.-K.H.); (M.Z.C.); (L.Č.-Š.); (M.B.)
| | - Mark Brönstrup
- Helmholtz Centre for Infection Research, Inhoffenstr. 7, 38124 Braunschweig, Germany; (S.-K.H.); (M.Z.C.); (L.Č.-Š.); (M.B.)
| | - Michael Hust
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| | - Stefan Dübel
- Institut für Biochemie, Biotechnologie und Bioinformatik, Abteilung Biotechnologie, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany; (F.B.); (S.S.); (P.A.H.); (M.S.)
| |
Collapse
|
15
|
Liu Y, Sheng W, Wu J, Zheng J, Zhi X, Zhang S, Gu C, Guo D, Wang W. Case report: Altered pre-mRNA splicing caused by intronic variant c.1499 + 1G > A in the SLC4A4 gene. Front Pediatr 2022; 10:890147. [PMID: 36061388 PMCID: PMC9428394 DOI: 10.3389/fped.2022.890147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Accepted: 07/25/2022] [Indexed: 11/29/2022] Open
Abstract
Proximal renal tubular acidosis (pRTA) with ocular abnormalities is an autosomal recessive disease caused by variants in the Solute Carrier Family 4 Member 4 (SLC4A4) gene. Patients present with metabolic acidosis and low plasma bicarbonate concentration (3∼17 mmol/L). In addition, they are often accompanied by ocular abnormalities, intellectual disability, and growth retardation. The patient underwent whole exome sequencing (WES) and bioinformatics analysis of variant pathogenicity in this study. Then, a minigene assay was conducted to analyze the splicing site variant further. Compound heterozygous variants in the SLC4A4 gene (NM_003759.3), c.145C > T (p.Arg49*) and c.1499 + 1G > A, were detected by WES. The minigene assay showed an mRNA splicing aberration caused by the c.1499 + 1G > A variant. Compared with the wild type, the mutant type caused 4-base insertion between exons 10 and 11 of SLC4A4 after expression in HEK293 cells. In conclusion, the c.1499 + 1G > A variant in the SLC4A4 gene may be one of the genetic causes in the patient. Moreover, our study provides the foundation for future gene therapy of such pathogenic variants.
Collapse
Affiliation(s)
- Yan Liu
- Department of Nephrology, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China
| | - Wenchao Sheng
- Graduate College of Tianjin Medical University, Tianjin Medical University, Tianjin, China.,Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Jinying Wu
- Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Jie Zheng
- Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Xiufang Zhi
- Graduate College of Tianjin Medical University, Tianjin Medical University, Tianjin, China.,Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Shuyue Zhang
- Graduate College of Tianjin Medical University, Tianjin Medical University, Tianjin, China.,Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Chunyu Gu
- Graduate College of Tianjin Medical University, Tianjin Medical University, Tianjin, China.,Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Detong Guo
- Graduate College of Tianjin Medical University, Tianjin Medical University, Tianjin, China.,Tianjin Pediatric Research Institute, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China.,Tianjin Key Laboratory of Birth Defects for Prevention and Treatment, Tianjin, China
| | - Wenhong Wang
- Department of Nephrology, Tianjin Children's Hospital (Tianjin University Children's Hospital), Tianjin, China
| |
Collapse
|
16
|
Scalzitti N, Kress A, Orhand R, Weber T, Moulinier L, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Spliceator: multi-species splice site prediction using convolutional neural networks. BMC Bioinformatics 2021; 22:561. [PMID: 34814826 PMCID: PMC8609763 DOI: 10.1186/s12859-021-04471-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 11/09/2021] [Indexed: 12/14/2022] Open
Abstract
Background Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04471-3.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Romain Orhand
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Thomas Weber
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Luc Moulinier
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.
| |
Collapse
|
17
|
Top O, Milferstaedt SWL, van Gessel N, Hoernstein SNW, Özdemir B, Decker EL, Reski R. Expression of a human cDNA in moss results in spliced mRNAs and fragmentary protein isoforms. Commun Biol 2021; 4:964. [PMID: 34385580 PMCID: PMC8361020 DOI: 10.1038/s42003-021-02486-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 07/26/2021] [Indexed: 12/18/2022] Open
Abstract
Production of biopharmaceuticals relies on the expression of mammalian cDNAs in host organisms. Here we show that the expression of a human cDNA in the moss Physcomitrium patens generates the expected full-length and four additional transcripts due to unexpected splicing. This mRNA splicing results in non-functional protein isoforms, cellular misallocation of the proteins and low product yields. We integrated these results together with the results of our analysis of all 32,926 protein-encoding Physcomitrella genes and their 87,533 annotated transcripts in a web application, physCO, for automatized optimization. A thus optimized cDNA results in about twelve times more protein, which correctly localizes to the ER. An analysis of codon preferences of different production hosts suggests that similar effects occur also in non-plant hosts. We anticipate that the use of our methodology will prevent so far undetected mRNA heterosplicing resulting in maximized functional protein amounts for basic biology and biotechnology.
Collapse
Affiliation(s)
- Oguz Top
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg, Germany
- Plant Molecular Cell Biology, Department Biology I, LMU Biocenter, Ludwig-Maximilians-University Munich, Planegg-Martinsried, Germany
| | - Stella W L Milferstaedt
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Cluster of Excellence livMatS @ FIT - Freiburg Center for Interactive Materials and Bioinspired Technologies, University of Freiburg, Freiburg, Germany
| | - Nico van Gessel
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | | | - Bugra Özdemir
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Eva L Decker
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Freiburg, Germany.
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg, Germany.
- Cluster of Excellence livMatS @ FIT - Freiburg Center for Interactive Materials and Bioinspired Technologies, University of Freiburg, Freiburg, Germany.
- CIBSS - Centre for Integrative Biological Signalling Studies, Freiburg, Germany.
| |
Collapse
|
18
|
Pucker B, Kleinbölting N, Weisshaar B. Large scale genomic rearrangements in selected Arabidopsis thaliana T-DNA lines are caused by T-DNA insertion mutagenesis. BMC Genomics 2021; 22:599. [PMID: 34362298 PMCID: PMC8348815 DOI: 10.1186/s12864-021-07877-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 07/06/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Experimental proof of gene function assignments in plants is based on mutant analyses. T-DNA insertion lines provided an invaluable resource of mutants and enabled systematic reverse genetics-based investigation of the functions of Arabidopsis thaliana genes during the last decades. RESULTS We sequenced the genomes of 14 A. thaliana GABI-Kat T-DNA insertion lines, which eluded flanking sequence tag-based attempts to characterize their insertion loci, with Oxford Nanopore Technologies (ONT) long reads. Complex T-DNA insertions were resolved and 11 previously unknown T-DNA loci identified, resulting in about 2 T-DNA insertions per line and suggesting that this number was previously underestimated. T-DNA mutagenesis caused fusions of chromosomes along with compensating translocations to keep the gene set complete throughout meiosis. Also, an inverted duplication of 800 kbp was detected. About 10 % of GABI-Kat lines might be affected by chromosomal rearrangements, some of which do not involve T-DNA. Local assembly of selected reads was shown to be a computationally effective method to resolve the structure of T-DNA insertion loci. We developed an automated workflow to support investigation of long read data from T-DNA insertion lines. All steps from DNA extraction to assembly of T-DNA loci can be completed within days. CONCLUSIONS Long read sequencing was demonstrated to be an effective way to resolve complex T-DNA insertions and chromosome fusions. Many T-DNA insertions comprise not just a single T-DNA, but complex arrays of multiple T-DNAs. It is becoming obvious that T-DNA insertion alleles must be characterized by exact identification of both T-DNA::genome junctions to generate clear genotype-to-phenotype relations.
Collapse
Affiliation(s)
- Boas Pucker
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, Germany
- Evolution and Diversity, Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Nils Kleinbölting
- Bioinformatics Resource Facility, Center for Biotechnology (CeBiTec, Bielefeld University, Sequenz 1, 33615 Bielefeld, Germany
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, Germany
| |
Collapse
|
19
|
Gillani R, Seong BKA, Crowdis J, Conway JR, Dharia NV, Alimohamed S, Haas BJ, Han K, Park J, Dietlein F, He MX, Imamovic A, Ma C, Bassik MC, Boehm JS, Vazquez F, Gusev A, Liu D, Janeway KA, McFarland JM, Stegmaier K, Van Allen EM. Gene Fusions Create Partner and Collateral Dependencies Essential to Cancer Cell Survival. Cancer Res 2021; 81:3971-3984. [PMID: 34099491 PMCID: PMC8338889 DOI: 10.1158/0008-5472.can-21-0791] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 03/26/2021] [Accepted: 06/04/2021] [Indexed: 01/07/2023]
Abstract
Gene fusions frequently result from rearrangements in cancer genomes. In many instances, gene fusions play an important role in oncogenesis; in other instances, they are thought to be passenger events. Although regulatory element rearrangements and copy number alterations resulting from these structural variants are known to lead to transcriptional dysregulation across cancers, the extent to which these events result in functional dependencies with an impact on cancer cell survival is variable. Here we used CRISPR-Cas9 dependency screens to evaluate the fitness impact of 3,277 fusions across 645 cell lines from the Cancer Dependency Map. We found that 35% of cell lines harbored either a fusion partner dependency or a collateral dependency on a gene within the same topologically associating domain as a fusion partner. Fusion-associated dependencies revealed numerous novel oncogenic drivers and clinically translatable alterations. Broadly, fusions can result in partner and collateral dependencies that have biological and clinical relevance across cancer types. SIGNIFICANCE: This study provides insights into how fusions contribute to fitness in different cancer contexts beyond partner-gene activation events, identifying partner and collateral dependencies that may have direct implications for clinical care.
Collapse
Affiliation(s)
- Riaz Gillani
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts.,Boston Children's Hospital, Boston, Massachusetts
| | - Bo Kyung A. Seong
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | - Jett Crowdis
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Jake R. Conway
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Harvard Medical School, Boston, Massachusetts
| | - Neekesh V. Dharia
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts.,Boston Children's Hospital, Boston, Massachusetts
| | - Saif Alimohamed
- Wake Forest School of Medicine, Medical Center Boulevard, Winston-Salem, North Carolina
| | - Brian J. Haas
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | - Kyuho Han
- Department of Genetics, Stanford University School of Medicine, Stanford, California
| | - Jihye Park
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Felix Dietlein
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Meng Xiao He
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Harvard Medical School, Boston, Massachusetts
| | - Alma Imamovic
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Clement Ma
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Michael C. Bassik
- Department of Genetics, Stanford University School of Medicine, Stanford, California.,Program in Cancer Biology, Stanford University School of Medicine, Stanford, California.,Program in Chemistry, Engineering and Medicine for Human Health (ChEM-H), Stanford University, Stanford, California
| | - Jesse S. Boehm
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | | | - Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - David Liu
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Katherine A. Janeway
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts.,Boston Children's Hospital, Boston, Massachusetts
| | | | - Kimberly Stegmaier
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts.,Boston Children's Hospital, Boston, Massachusetts
| | - Eliezer M. Van Allen
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, Massachusetts.,Corresponding Author: Eliezer M. Van Allen, Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215. Phone: 617-632-6656; E-mail:
| |
Collapse
|
20
|
Liu H, Sun M, Pan H, Cheng T, Wang J, Zhang Q. Two Cyc2CL transcripts (Cyc2CL-1 and Cyc2CL-2) may play key roles in the petal and stamen development of ray florets in chrysanthemum. BMC PLANT BIOLOGY 2021; 21:105. [PMID: 33607954 PMCID: PMC7893774 DOI: 10.1186/s12870-021-02884-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 02/09/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Chrysanthemum morifolium is one of the most popular ornamental crops. The capitulum, which is the main ornamental part of chrysanthemum plants, consists of ligulate marginal ray florets, an attractive corolla (petals), and radially hermaphroditic disc florets, but no stamens. In Asteraceae species, the zygomorphic ray florets evolved from the actinomorphic disc florets. During this process, the zygomorphic ligulate corolla arose and the stamens were aborted. Although molecular genetic research has clarified ray floret development to some extent, the precise molecular mechanism underlying ray floret development in chrysanthemum remained unclear. RESULTS A CYC2-like gene, Cyc2CL, was cloned from C. morifolium 'Fenditan'. Subsequent analyses revealed that the alternative splicing of Cyc2CL, which occurred in the flower differentiation stage, resulted in the production of Cyc2CL-1 and Cyc2CL-2 in the apical buds. Prior to this stage, only Cyc2CL-1 was produced in the apical buds. A fluorescence in situ hybridization analysis of labeled Cyc2CL-1 and Cyc2CL-2 RNA indicated that Cyc2CL-2 was first expressed in the involucre tissue during the final involucre differentiation stage, but was subsequently expressed in the receptacle and floret primordia as the floral bud differentiation stage progressed. Moreover, Cyc2CL-2 was highly expressed in the inflorescence tissue during the corolla formation stage, and the expression remained high until the end of the floral bud differentiation stage. Furthermore, the overexpression of Cyc2CL-1 and Cyc2CL-2 in transgenic Arabidopsis inhibited stamen and petal development. Therefore, both Cyc2CL-1 and Cyc2CL-2 encode candidate regulators of petal development and stamen abortion and are important for the ray floret development in chrysanthemum. CONCLUSION In this study, we characterized the alternatively spliced transcripts of the CYC2-like gene that differ subtly regarding expression and function. The data presented herein will be useful for clarifying the regulatory mechanisms associated with the CYC2-like gene and may also be important for identifying the key genes and molecular mechanisms controlling the development of ray florets in chrysanthemum.
Collapse
Affiliation(s)
- Hua Liu
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Beijing Laboratory of Urban and Rural Ecological Environment, Engineering Research Center of Landscape Environment of Ministry of Education, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083 China
| | - Ming Sun
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Beijing Laboratory of Urban and Rural Ecological Environment, Engineering Research Center of Landscape Environment of Ministry of Education, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083 China
| | - Huitang Pan
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Beijing Laboratory of Urban and Rural Ecological Environment, Engineering Research Center of Landscape Environment of Ministry of Education, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083 China
| | - Tangren Cheng
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Beijing Laboratory of Urban and Rural Ecological Environment, Engineering Research Center of Landscape Environment of Ministry of Education, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083 China
| | - Jia Wang
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Beijing Laboratory of Urban and Rural Ecological Environment, Engineering Research Center of Landscape Environment of Ministry of Education, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083 China
| | - Qixiang Zhang
- Beijing Key Laboratory of Ornamental Plants Germplasm Innovation & Molecular Breeding, National Engineering Research Center for Floriculture, Beijing Laboratory of Urban and Rural Ecological Environment, Engineering Research Center of Landscape Environment of Ministry of Education, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants of Ministry of Education, School of Landscape Architecture, Beijing Forestry University, Beijing, 100083 China
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing, 100083 China
| |
Collapse
|
21
|
Poverennaya IV, Roytberg MA. Spliceosomal Introns: Features, Functions, and Evolution. BIOCHEMISTRY (MOSCOW) 2021; 85:725-734. [PMID: 33040717 DOI: 10.1134/s0006297920070019] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Spliceosomal introns, which have been found in most eukaryotic genes, are non-coding sequences excised from pre-mRNAs by a special complex called spliceosome during mRNA splicing. Introns occur in both protein- and RNA-coding genes and can be found in coding and untranslated gene regions. Because intron sequences vary greatly due to a high rate of polymorphism, the functions of intron had been for a long time associated only with alternative splicing, while intron evolution had been viewed not as an evolution of an individual genomic element, but rather considered within a framework of the evolution of the gene intron-exon structure. Here, we review the theories of intron origin, evolutionary events in the exon-intron structure, such as intron gain, loss, and sliding, intron functions known to date, and mechanisms by which changes in the intron features (length and phase) can affect the regulation of gene-mediated processes.
Collapse
Affiliation(s)
- I V Poverennaya
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia. .,Institute of Mathematical Problems in Biology, Keldysh Branch of Institute of Applied Mathematics, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia
| | - M A Roytberg
- Institute of Mathematical Problems in Biology, Keldysh Branch of Institute of Applied Mathematics, Russian Academy of Sciences, Pushchino, Moscow Region, 142290, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russia.,Higher School of Economics, Moscow, 101000, Russia
| |
Collapse
|
22
|
Sielemann K, Hafner A, Pucker B. The reuse of public datasets in the life sciences: potential risks and rewards. PeerJ 2020; 8:e9954. [PMID: 33024631 PMCID: PMC7518187 DOI: 10.7717/peerj.9954] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open
Abstract
The 'big data' revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define 'successful reuse' as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences.
Collapse
Affiliation(s)
- Katharina Sielemann
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, Bielefeld, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany
| | - Alenka Hafner
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, Bielefeld, Germany
- Current Affiliation: Intercollege Graduate Degree Program in Plant Biology, Penn State University, University Park, State College, PA, United States of America
| | - Boas Pucker
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, Bielefeld, Germany
- Evolution and Diversity, Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
23
|
Pucker B, Reiher F, Schilbert HM. Automatic Identification of Players in the Flavonoid Biosynthesis with Application on the Biomedicinal Plant Croton tiglium. PLANTS (BASEL, SWITZERLAND) 2020; 9:E1103. [PMID: 32867203 PMCID: PMC7570183 DOI: 10.3390/plants9091103] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 08/11/2020] [Accepted: 08/25/2020] [Indexed: 02/06/2023]
Abstract
The flavonoid biosynthesis is a well-characterised model system for specialised metabolism and transcriptional regulation in plants. Flavonoids have numerous biological functions such as UV protection and pollinator attraction, but also biotechnological potential. Here, we present Knowledge-based Identification of Pathway Enzymes (KIPEs) as an automatic approach for the identification of players in the flavonoid biosynthesis. KIPEs combines comprehensive sequence similarity analyses with the inspection of functionally relevant amino acid residues and domains in subjected peptide sequences. Comprehensive sequence sets of flavonoid biosynthesis enzymes and knowledge about functionally relevant amino acids were collected. As a proof of concept, KIPEs was applied to investigate the flavonoid biosynthesis of the medicinal plant Croton tiglium on the basis of a transcriptome assembly. Enzyme candidates for all steps in the biosynthesis network were identified and matched to previous reports of corresponding metabolites in Croton species.
Collapse
Affiliation(s)
- Boas Pucker
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany; (B.P.); (F.R.)
- Department of Plant Sciences, Evolution and Diversity, University of Cambridge, Cambridge CB2 3EA, UK
| | - Franziska Reiher
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany; (B.P.); (F.R.)
| | - Hanna Marie Schilbert
- Genetics and Genomics of Plants, CeBiTec & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany; (B.P.); (F.R.)
| |
Collapse
|
24
|
McGarvey P, Huang J, McCoy M, Orvis J, Katsir Y, Lotringer N, Nesher I, Kavarana M, Sun M, Peet R, Meiri D, Madhavan S. De novo assembly and annotation of transcriptomes from two cultivars of Cannabis sativa with different cannabinoid profiles. Gene 2020; 762:145026. [PMID: 32781193 DOI: 10.1016/j.gene.2020.145026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 07/31/2020] [Indexed: 10/23/2022]
Abstract
Cannabis has been cultivated for millennia for medicinal, industrial and recreational uses. Our long-term goal is to compare the transcriptomes of cultivars with different cannabinoid profiles for therapeutic purposes. Here we describe the de novo assembly, annotation and initial analysis of two cultivars of Cannabis, a high THC variety and a CBD plus THC variety. Cultivars were grown under different lighting conditions; flower buds were sampled over 71 days. Cannabinoid profiles were determined by ESI-LC/MS. RNA samples were sequenced using the HiSeq4000 platform. Transcriptomes were assembled using the DRAP pipeline and annotated using the BLAST2GO pipeline and other tools. Each transcriptome contained over twenty thousand protein encoding transcripts with ORFs and flanking sequence. Identification of transcripts for cannabinoid pathway and related enzymes showed full-length ORFs that align with the draft genomes of the Purple Kush and Finola cultivars. Two transcripts were found for olivetolic acid cyclase (OAC) that mapped to distinct locations on the Purple Kush genome suggesting multiple genes for OAC are expressed in some cultivars. The ability to make high quality annotated reference transcriptomes in Cannabis or other plants can promote rapid comparative analysis between cultivars and growth conditions in Cannabis and other organisms without annotated genome assemblies.
Collapse
Affiliation(s)
- Peter McGarvey
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Jiahao Huang
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Matthew McCoy
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Joshua Orvis
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Yael Katsir
- Technion - Israel Institute of Technology, Haifa, Israel
| | | | | | | | - Mingyang Sun
- Teewinot Life Sciences Corporation, Tampa, FL, USA
| | | | - David Meiri
- Technion - Israel Institute of Technology, Haifa, Israel
| | - Subha Madhavan
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
25
|
Siadjeu C, Pucker B, Viehöver P, Albach DC, Weisshaar B. High Contiguity De Novo Genome Sequence Assembly of Trifoliate Yam ( Dioscorea dumetorum) Using Long Read Sequencing. Genes (Basel) 2020; 11:E274. [PMID: 32143301 PMCID: PMC7140821 DOI: 10.3390/genes11030274] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 02/25/2020] [Accepted: 02/29/2020] [Indexed: 12/17/2022] Open
Abstract
Trifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 h after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9941 non-coding RNA genes were predicted, and functional annotations were assigned.
Collapse
Affiliation(s)
- Christian Siadjeu
- Institute for Biology and Environmental Sciences, Biodiversity and Evolution of Plants, Carl-von-Ossietzky University Oldenburg, Carl-von-Ossietzky Str. 9-11, 26111 Oldenburg, Germany; (C.S.); (D.C.A.)
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
| | - Boas Pucker
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
- Molecular Genetics and Physiology of Plants, Faculty of Biology and Biotechnology, Ruhr-University Bochum, Universitätsstraße 150, 44801 Bochum, Germany
| | - Prisca Viehöver
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
| | - Dirk C. Albach
- Institute for Biology and Environmental Sciences, Biodiversity and Evolution of Plants, Carl-von-Ossietzky University Oldenburg, Carl-von-Ossietzky Str. 9-11, 26111 Oldenburg, Germany; (C.S.); (D.C.A.)
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
| |
Collapse
|
26
|
Frey K, Pucker B. Animal, Fungi, and Plant Genome Sequences Harbor Different Non-Canonical Splice Sites. Cells 2020; 9:E458. [PMID: 32085510 PMCID: PMC7072748 DOI: 10.3390/cells9020458] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/11/2020] [Accepted: 02/14/2020] [Indexed: 11/17/2022] Open
Abstract
Most protein-encoding genes in eukaryotes contain introns, which are interwoven with exons. Introns need to be removed from initial transcripts in order to generate the final messenger RNA (mRNA), which can be translated into an amino acid sequence. Precise excision of introns by the spliceosome requires conserved dinucleotides, which mark the splice sites. However, there are variations of the highly conserved combination of GT at the 5' end and AG at the 3' end of an intron in the genome. GC-AG and AT-AC are two major non-canonical splice site combinations, which have been known for years. Recently, various minor non-canonical splice site combinations were detected with numerous dinucleotide permutations. Here, we expand systematic investigations of non-canonical splice site combinations in plants across eukaryotes by analyzing fungal and animal genome sequences. Comparisons of splice site combinations between these three kingdoms revealed several differences, such as an apparently increased CT-AC frequency in fungal genome sequences. Canonical GT-AG splice site combinations in antisense transcripts are a likely explanation for this observation, thus indicating annotation errors. In addition, high numbers of GA-AG splice site combinations were observed in Eurytemoraaffinis and Oikopleuradioica. A variant in one U1 small nuclear RNA (snRNA) isoform might allow the recognition of GA as a 5' splice site. In depth investigation of splice site usage based on RNA-Seq read mappings indicates a generally higher flexibility of the 3' splice site compared to the 5' splice site across animals, fungi, and plants.
Collapse
Affiliation(s)
- Katharina Frey
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany;
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615 Bielefeld, Germany
| | - Boas Pucker
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, 33615 Bielefeld, Germany;
- Molecular Genetics and Physiology of Plants, Faculty of Biology and Biotechnology, Ruhr-University Bochum, Universitätsstraße 150, 44801 Bochum, Germany
| |
Collapse
|
27
|
Bhayana L, Paritosh K, Arora H, Yadava SK, Singh P, Nandan D, Mukhopadhyay A, Gupta V, Pradhan AK, Pental D. A Mapped Locus on LG A6 of Brassica juncea Line Tumida Conferring Resistance to White Rust Contains a CNL Type R Gene. FRONTIERS IN PLANT SCIENCE 2020; 10:1690. [PMID: 31998351 PMCID: PMC6960627 DOI: 10.3389/fpls.2019.01690] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 11/29/2019] [Indexed: 05/26/2023]
Abstract
White rust, causal agent oomycete Albugo candida, is a significant disease of the cultivated Brassica species. The Indian gene pool lines of oilseed mustard, Brassica juncea, are highly susceptible to the pathogen. Resistance to A. candida has been reported in the east European gene pool lines of mustard and mapped to LG A4 in line Heera and LG A5 in line Donskaja-IV. A new resistance-conferring locus to A. candida isolate AcB1 has been mapped to LG A6 of B. juncea line Tumida-a Chinese vegetable type mustard using an F1DH mapping population that has been developed from a Tumida × Varuna (susceptible Indian gene pool line) cross. A molecular map containing 8,303 genic and GBS markers was used to map the resistance trait to an interval of 63.0 cM-70.8 cM on LG A6. Genome assemblies of Tumida and Varuna were used to find the genes present within the flanking markers discerned by genetic mapping. The most likely candidate gene in the mapped interval is BjuA046215, a CC-NBS-LRR (CNL) type R gene that encodes a protein with all the specific subdomains of the proteins encoded by such genes. Alleles of BjuA046215 in Varuna and other lines of the Indian and the east European gene pools encode proteins that have truncated LRR domains. Analysis of the syntenic regions in some of the Brassicaceae genomes and phylogenetic analysis of CNL type R genes showed BjuA046215 to be closely related to a recently described white rust resistance-conferring R gene BjuWRR1 in B. juncea Donskaja-IV, both belonging to the CNL-D group of R genes. Related R genes in Arabidopsis thaliana confer resistance to another oomycete, Peronospora parasitica.
Collapse
Affiliation(s)
- Latika Bhayana
- Department of Genetics, University of Delhi South Campus, New Delhi, India
| | - Kumar Paritosh
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| | - Heena Arora
- Department of Genetics, University of Delhi South Campus, New Delhi, India
| | - Satish Kumar Yadava
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| | - Priyansha Singh
- Department of Genetics, University of Delhi South Campus, New Delhi, India
| | - Divakar Nandan
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| | - Arundhati Mukhopadhyay
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| | - Vibha Gupta
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| | - Akshay Kumar Pradhan
- Department of Genetics, University of Delhi South Campus, New Delhi, India
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| | - Deepak Pental
- Centre for Genetic Manipulation of Crop Plants, University of Delhi South Campus, New Delhi, India
| |
Collapse
|
28
|
Pucker B, Holtgräwe D, Stadermann KB, Frey K, Huettel B, Reinhardt R, Weisshaar B. A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLoS One 2019; 14:e0216233. [PMID: 31112551 PMCID: PMC6529160 DOI: 10.1371/journal.pone.0216233] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 04/16/2019] [Indexed: 01/27/2023] Open
Abstract
In addition to the BAC-based reference sequence of the accession Columbia-0 from the year 2000, several short read assemblies of THE plant model organism Arabidopsis thaliana were published during the last years. Also, a SMRT-based assembly of Landsberg erecta has been generated that identified translocation and inversion polymorphisms between two genotypes of the species. Here we provide a chromosome-arm level assembly of the A. thaliana accession Niederzenz-1 (AthNd-1_v2c) based on SMRT sequencing data. The best assembly comprises 69 nucleome sequences and displays a contig length of up to 16 Mbp. Compared to an earlier Illumina short read-based NGS assembly (AthNd-1_v1), a 75 fold increase in contiguity was observed for AthNd-1_v2c. To assign contig locations independent from the Col-0 gold standard reference sequence, we used genetic anchoring to generate a de novo assembly. In addition, we assembled the chondrome and plastome sequences. Detailed analyses of AthNd-1_v2c allowed reliable identification of large genomic rearrangements between A. thaliana accessions contributing to differences in the gene sets that distinguish the genotypes. One of the differences detected identified a gene that is lacking from the Col-0 gold standard sequence. This de novo assembly extends the known proportion of the A. thaliana pan-genome.
Collapse
Affiliation(s)
- Boas Pucker
- Bielefeld University, Faculty of Biology & Center for Biotechnology, Bielefeld, Germany
| | - Daniela Holtgräwe
- Bielefeld University, Faculty of Biology & Center for Biotechnology, Bielefeld, Germany
| | - Kai Bernd Stadermann
- Bielefeld University, Faculty of Biology & Center for Biotechnology, Bielefeld, Germany
| | - Katharina Frey
- Bielefeld University, Faculty of Biology & Center for Biotechnology, Bielefeld, Germany
| | - Bruno Huettel
- Max Planck Genome Centre Cologne, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Richard Reinhardt
- Max Planck Genome Centre Cologne, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Bernd Weisshaar
- Bielefeld University, Faculty of Biology & Center for Biotechnology, Bielefeld, Germany
| |
Collapse
|