Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Picardi E, Pesole G. Computational methods for ab initio and comparative gene finding. Methods Mol Biol 2010;609:269-84. [PMID: 20221925 DOI: 10.1007/978-1-60327-241-4_16] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

For:	Picardi E, Pesole G. Computational methods for ab initio and comparative gene finding. Methods Mol Biol 2010;609:269-84. [PMID: 20221925 DOI: 10.1007/978-1-60327-241-4_16] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Number

Cited by Other Article(s)

Chen H, Liu Y, Balabani S, Hirayama R, Huang J. Machine Learning in Predicting Printable Biomaterial Formulations for Direct Ink Writing. RESEARCH (WASHINGTON, D.C.) 2023;6:0197. [PMID: 37469394 PMCID: PMC10353544 DOI: 10.34133/research.0197] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/29/2023] [Indexed: 07/21/2023]

Wang Y, Cai X, Hu S, Qin S, Wang Z, Cao Y, Hou C, Yang J, Zhou W. Comparative genomic analysis provides insight into the phylogeny and potential mechanisms of adaptive evolution of Sphingobacterium sp. CZ-2. Gene 2023;855:147118. [PMID: 36521669 DOI: 10.1016/j.gene.2022.147118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022]

Affiliation(s)

Yongqiang Wang Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Xunhui Cai School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Shengnan Hu Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Sidong Qin Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Ziqi Wang Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Yixiang Cao Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Chaoliang Hou Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Jiangshan Yang Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
Wei Zhou Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China.

Collapse

Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023;62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]

Abstract

Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.

Collapse

Karanth S, Tanui CK, Meng J, Pradhan AK. Exploring the predictive capability of advanced machine learning in identifying severe disease phenotype in Salmonella enterica. Food Res Int 2022;151:110817. [PMID: 34980422 DOI: 10.1016/j.foodres.2021.110817] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/12/2021] [Accepted: 11/17/2021] [Indexed: 11/26/2022]

Ejigu GF, Yi G, Kim JI, Jung J. ReGSP: a visualized application for homology-based gene searching and plotting using multiple reference sequences. PeerJ 2021;9:e12707. [PMID: 35036172 PMCID: PMC8710255 DOI: 10.7717/peerj.12707] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/07/2021] [Indexed: 12/17/2022] Open

Wang Q, Kille B, Liu TR, Elworth RAL, Treangen TJ. PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment. Nat Commun 2021;12:1167. [PMID: 33637701 PMCID: PMC7910462 DOI: 10.1038/s41467-021-21180-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 01/12/2021] [Indexed: 12/26/2022] Open

Coding Exon-Structure Aware Realigner (CESAR): Utilizing Genome Alignments for Comparative Gene Annotation. Methods Mol Biol 2019;1962:179-191. [PMID: 31020560 DOI: 10.1007/978-1-4939-9173-0_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Artificial intelligence used in genome analysis studies. EUROBIOTECH JOURNAL 2018. [DOI: 10.2478/ebtj-2018-0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res 2017. [PMID: 28645144 PMCID: PMC5737078 DOI: 10.1093/nar/gkx554] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Making sense of genomes of parasitic worms: Tackling bioinformatic challenges. Biotechnol Adv 2016;34:663-686. [DOI: 10.1016/j.biotechadv.2016.03.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 01/25/2023]

Sharma V, Elghafari A, Hiller M. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Nucleic Acids Res 2016;44:e103. [PMID: 27016733 PMCID: PMC4914097 DOI: 10.1093/nar/gkw210] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 03/04/2016] [Accepted: 03/18/2016] [Indexed: 12/03/2022] Open

Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet 2015;16:321-32. [PMID: 25948244 PMCID: PMC5204302 DOI: 10.1038/nrg3920] [Citation(s) in RCA: 889] [Impact Index Per Article: 88.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Chen ZX, Sturgill D, Qu J, Jiang H, Park S, Boley N, Suzuki AM, Fletcher AR, Plachetzki DC, FitzGerald PC, Artieri CG, Atallah J, Barmina O, Brown JB, Blankenburg KP, Clough E, Dasgupta A, Gubbala S, Han Y, Jayaseelan JC, Kalra D, Kim YA, Kovar CL, Lee SL, Li M, Malley JD, Malone JH, Mathew T, Mattiuzzo NR, Munidasa M, Muzny DM, Ongeri F, Perales L, Przytycka TM, Pu LL, Robinson G, Thornton RL, Saada N, Scherer SE, Smith HE, Vinson C, Warner CB, Worley KC, Wu YQ, Zou X, Cherbas P, Kellis M, Eisen MB, Piano F, Kionte K, Fitch DH, Sternberg PW, Cutter AD, Duff MO, Hoskins RA, Graveley BR, Gibbs RA, Bickel PJ, Kopp A, Carninci P, Celniker SE, Oliver B, Richards S. Comparative validation of the D. melanogaster modENCODE transcriptome annotation. Genome Res 2015;24:1209-23. [PMID: 24985915 PMCID: PMC4079975 DOI: 10.1101/gr.159384.113] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Affiliation(s)

Zhen-Xia Chen National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
David Sturgill National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Jiaxin Qu Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Huaiyang Jiang Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Soo Park Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
Nathan Boley Department of Statistics, University of California, Berkeley, California 94720, USA
Ana Maria Suzuki Technology Development Group, RIKEN Omics Science Center and RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama City, Kanagawa, Japan 230-0045
Anthony R Fletcher Division of Computational Bioscience, Center For Information Technology, National Institutes of Health, Bethesda, Maryland 20814, USA
David C Plachetzki Department of Evolution and Ecology, University of California, Davis, California 95616, USA
Peter C FitzGerald National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
Carlo G Artieri National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Joel Atallah Department of Evolution and Ecology, University of California, Davis, California 95616, USA
Olga Barmina Department of Evolution and Ecology, University of California, Davis, California 95616, USA
James B Brown Department of Statistics, University of California, Berkeley, California 94720, USA
Kerstin P Blankenburg Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Emily Clough National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Abhijit Dasgupta Clinical Trials and Outcomes Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Sai Gubbala Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Yi Han Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Joy C Jayaseelan Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Divya Kalra Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Yoo-Ah Kim National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20892, USA
Christie L Kovar Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Sandra L Lee Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Mingmei Li Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
James D Malley Division of Computational Bioscience, Center For Information Technology, National Institutes of Health, Bethesda, Maryland 20814, USA
John H Malone National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Tittu Mathew Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Nicolas R Mattiuzzo National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Mala Munidasa Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Donna M Muzny Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Fiona Ongeri Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Lora Perales Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Teresa M Przytycka National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20892, USA
Ling-Ling Pu Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Garrett Robinson Department of Statistics, University of California, Berkeley, California 94720, USA
Rebecca L Thornton Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Nehad Saada Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Steven E Scherer Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Harold E Smith National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Charles Vinson National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
Crystal B Warner Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Kim C Worley Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Yuan-Qing Wu Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Xiaoyan Zou Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Peter Cherbas Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
Manolis Kellis Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 20139, USA
Michael B Eisen Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
Fabio Piano Department of Biology, New York University, New York, New York 10003, USA
Karin Kionte Department of Biology, New York University, New York, New York 10003, USA
David H Fitch Department of Biology, New York University, New York, New York 10003, USA
Paul W Sternberg HHMI and Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
Asher D Cutter Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B2, Canada
Michael O Duff Department of Genetics and Developmental Biology, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
Roger A Hoskins Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
Brenton R Graveley Department of Genetics and Developmental Biology, Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut 06030-6403, USA
Richard A Gibbs Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
Peter J Bickel Department of Statistics, University of California, Berkeley, California 94720, USA
Artyom Kopp Department of Evolution and Ecology, University of California, Davis, California 95616, USA
Piero Carninci Technology Development Group, RIKEN Omics Science Center and RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama City, Kanagawa, Japan 230-0045
Susan E Celniker Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
Brian Oliver National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA
Stephen Richards Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA

Collapse

Curran DM, Gilleard JS, Wasmuth JD. Figmop: a profile HMM to identify genes and bypass troublesome gene models in draft genomes. ACTA ACUST UNITED AC 2014;30:3266-7. [PMID: 25115706 DOI: 10.1093/bioinformatics/btu544] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Robert C, Fuentes-Utrilla P, Troup K, Loecherbach J, Turner F, Talbot R, Archibald AL, Mileham A, Deeb N, Hume DA, Watson M. Design and development of exome capture sequencing for the domestic pig (Sus scrofa). BMC Genomics 2014;15:550. [PMID: 24988888 PMCID: PMC4099480 DOI: 10.1186/1471-2164-15-550] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 06/19/2014] [Indexed: 12/30/2022] Open

van der Burgt A, Severing E, Collemare J, de Wit PJGM. Automated alignment-based curation of gene models in filamentous fungi. BMC Bioinformatics 2014;15:19. [PMID: 24433567 PMCID: PMC3898260 DOI: 10.1186/1471-2105-15-19] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 01/11/2014] [Indexed: 11/16/2022] Open

Abstract

Background

Automated gene-calling is still an error-prone process, particularly for the highly plastic genomes of fungal species. Improvement through quality control and manual curation of gene models is a time-consuming process that requires skilled biologists and is only marginally performed. The wealth of available fungal genomes has not yet been exploited by an automated method that applies quality control of gene models in order to obtain more accurate genome annotations.

Results

We provide a novel method named alignment-based fungal gene prediction (ABFGP) that is particularly suitable for plastic genomes like those of fungi. It can assess gene models on a gene-by-gene basis making use of informant gene loci. Its performance was benchmarked on 6,965 gene models confirmed by full-length unigenes from ten different fungi. 79.4% of all gene models were correctly predicted by ABFGP. It improves the output of ab initio gene prediction software due to a higher sensitivity and precision for all gene model components. Applicability of the method was shown by revisiting the annotations of six different fungi, using gene loci from up to 29 fungal genomes as informants. Between 7,231 and 8,337 genes were assessed by ABFGP and for each genome between 1,724 and 3,505 gene model revisions were proposed. The reliability of the proposed gene models is assessed by an a posteriori introspection procedure of each intron and exon in the multiple gene model alignment. The total number and type of proposed gene model revisions in the six fungal genomes is correlated to the quality of the genome assembly, and to sequencing strategies used in the sequencing centre, highlighting different types of errors in different annotation pipelines. The ABFGP method is particularly successful in discovering sequence errors and/or disruptive mutations causing truncated and erroneous gene models.

Conclusions

The ABFGP method is an accurate and fully automated quality control method for fungal gene catalogues that can be easily implemented into existing annotation pipelines. With the exponential release of new genomes, the ABFGP method will help decreasing the number of gene models that require additional manual curation.

Collapse

ASPic-GeneID: a lightweight pipeline for gene prediction and alternative isoforms detection. BIOMED RESEARCH INTERNATIONAL 2013;2013:502827. [PMID: 24308000 PMCID: PMC3838850 DOI: 10.1155/2013/502827] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2013] [Revised: 08/01/2013] [Accepted: 08/04/2013] [Indexed: 12/31/2022]

Kumar S, Koutsovoulos G, Kaur G, Blaxter M. Toward 959 nematode genomes. WORM 2013;1:42-50. [PMID: 24058822 PMCID: PMC3670170 DOI: 10.4161/worm.19046] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]

Searls DB. A primer in macromolecular linguistics. Biopolymers 2012;99:203-17. [PMID: 23034580 DOI: 10.1002/bip.22101] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 05/25/2012] [Indexed: 01/01/2023]

Pohl M, Theissen G, Schuster S. GC content dependency of open reading frame prediction via stop codon frequencies. Gene 2012;511:441-6. [PMID: 23000023 DOI: 10.1016/j.gene.2012.09.031] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Revised: 04/27/2012] [Accepted: 09/05/2012] [Indexed: 11/18/2022]

Abstract

A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.

Collapse

Boerjan B, Cardoen D, Verdonck R, Caers J, Schoofs L. Insect omics research coming of age1This review is part of a virtual symposium on recent advances in understanding a variety of complex regulatory processes in insect physiology and endocrinology, including development, metabolism, cold hardiness, food intake and digestion, and diuresis, through the use of omics technologies in the postgenomic era. CAN J ZOOL 2012. [DOI: 10.1139/z2012-010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Gilchrist MJ. From expression cloning to gene modeling: the development of Xenopus gene sequence resources. Genesis 2012;50:143-54. [PMID: 22344767 PMCID: PMC3488295 DOI: 10.1002/dvg.22008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Revised: 12/09/2011] [Accepted: 12/21/2011] [Indexed: 11/08/2022]

Shepard SS, McSweeny A, Serpen G, Fedorov A. Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models. Nucleic Acids Res 2012;40:4765-73. [PMID: 22344692 PMCID: PMC3367190 DOI: 10.1093/nar/gks154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Hamada M, Asai K. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA). J Comput Biol 2012;19:532-49. [PMID: 22313125 DOI: 10.1089/cmb.2011.0197] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Kuraku S, Meyer A. Detection and phylogenetic assessment of conserved synteny derived from whole genome duplications. Methods Mol Biol 2012;855:385-95. [PMID: 22407717 DOI: 10.1007/978-1-61779-582-4_14] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Hatje K, Keller O, Hammesfahr B, Pillmann H, Waack S, Kollmar M. Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio. BMC Res Notes 2011;4:265. [PMID: 21798037 PMCID: PMC3162530 DOI: 10.1186/1756-0500-4-265] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Accepted: 07/28/2011] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons.

RESULTS

Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools.

CONCLUSIONS

With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio outperforms all other software tested. WebScipio has been restructured and provides easy access to the genome assemblies of about 640 eukaryotic species. Scipio and WebScipio are freely accessible at http://www.webscipio.org.

Collapse

Yuryev A. Integrating fragmented software applications into holistic solutions: focus on drug discovery. Expert Opin Drug Discov 2011;6:383-92. [PMID: 22646016 DOI: 10.1517/17460441.2011.557659] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Buckley KM, Rast JP. Characterizing immune receptors from new genome sequences. Methods Mol Biol 2011;748:273-98. [PMID: 21701981 DOI: 10.1007/978-1-61779-139-0_19] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Hendrickson RC, Wang C, Hatcher EL, Lefkowitz EJ. Orthopoxvirus genome evolution: the role of gene loss. Viruses 2010;2:1933-1967. [PMID: 21994715 PMCID: PMC3185746 DOI: 10.3390/v2091933] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2010] [Revised: 08/25/2010] [Accepted: 09/01/2010] [Indexed: 12/26/2022] Open