Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

64
(from Reference Citation Analysis)

Article PDFs (32)

Cited by > 0 (53)

Searched Name

Gene prediction

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Chen L, Ma J, Xiang S, Jiang L, Wang Y, Li Z, Liu X, Duan S, Luo Y, Xiao Y. Promotion of rice seedlings growth and enhancement of cadmium immobilization under cadmium stress with two types of organic fertilizer. Environ Pollut 2024;346:123619. [PMID: 38401632 DOI: 10.1016/j.envpol.2024.123619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/03/2024] [Accepted: 02/19/2024] [Indexed: 02/26/2024]

Jiang L, Dai J, Wang L, Chen L, Zeng G, Liu E, Zhou X, Yao H, Xiao Y, Fang J. Effect of nitrogen retention composite additives Ca(H₂PO₄)₂ and MgSO₄ on the degradation of lignocellulose, compost maturation, and fungal communities in compost. Environ Sci Pollut Res Int 2024:10.1007/s11356-024-32992-w. [PMID: 38558335 DOI: 10.1007/s11356-024-32992-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 03/15/2024] [Indexed: 04/04/2024]

Wu LF, Zhu WG, Yu EP, Cao HL, Wang ZF. Draft genome of Brasenia schreberi, a worldwide distributed and endangered aquatic plant. BMC Genom Data 2024;25:24. [PMID: 38438998 PMCID: PMC10913576 DOI: 10.1186/s12863-024-01212-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 02/21/2024] [Indexed: 03/06/2024] Open

Affiliation(s)

Lin-Fang Wu Guangzhou Linfang Ecological Technology Co., Ltd, 510000, Guangzhou, China
Wei-Guang Zhu Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China South China National Botanical Garden, 510650, Guangzhou, China
En-Ping Yu Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China South China National Botanical Garden, 510650, Guangzhou, China University of Chinese Academy of Sciences, 100049, Beijing, China
Hong-Lin Cao Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China. Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China. Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China. South China National Botanical Garden, 510650, Guangzhou, China.
Zheng-Feng Wang Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China. Key Laboratory of National Forestry and Grassland Administration on Plant Conservation and Utilization in Southern China, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China. Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, 510650, Guangzhou, China. South China National Botanical Garden, 510650, Guangzhou, China.

Collapse

Southey BR, Romanova EV, Rodriguez-Zas SL, Sweedler JV. Bioinformatics for Prohormone and Neuropeptide Discovery. Methods Mol Biol 2024;2758:151-178. [PMID: 38549013 PMCID: PMC11045269 DOI: 10.1007/978-1-0716-3646-6_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]

Ismail E, Gad W, Hashem M. A hybrid Stacking-SMOTE model for optimizing the prediction of autistic genes. BMC Bioinformatics 2023;24:379. [PMID: 37803253 PMCID: PMC10559615 DOI: 10.1186/s12859-023-05501-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 09/27/2023] [Indexed: 10/08/2023] Open

Abstract

PURPOSE

Autism spectrum disorder(ASD) is a disease associated with the neurodevelopment of the brain. The autism spectrum can be observed in early childhood, where the symptoms of the disease usually appear in children within the first year of their life. Currently, ASD can only be diagnosed based on the apparent symptoms due to the lack of information on genes related to the disease. Therefore, in this paper, we need to predict the largest number of disease-causing genes for a better diagnosis.

METHODS

A hybrid stacking ensemble model with Synthetic Minority Oversampling TEchnique (Stack-SMOTE) is proposed to predict the genes associated with ASD. The proposed model uses the gene ontology database to measure the similarities between the genes using a hybrid gene similarity function(HGS). HGS is effective in measuring the similarity as it combines the features of information gain-based methods and graph-based methods. The proposed model solves the imbalanced ASD dataset problem using the Synthetic Minority Oversampling Technique (SMOTE), which generates synthetic data rather than duplicates the data to reduce the overfitting. Sequentially, a gradient boosting-based random forest classifier (GBBRF) is introduced as a new combination technique to enhance the prediction of ASD genes. Moreover, the GBBRF classifier combined with random forest(RF), k-nearest neighbor, support vector machine(SVM), and logistic regression(LR) to form the proposed Stacking-SMOTE model to optimize the prediction of ASD genes.

RESULTS

The proposed Stacking-SMOTE model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database and a set of candidates ASD genes.The results of the proposed model-based SMOTE outperform other reported undersampling and oversampling techniques. Sequentially, the results of GBBRF achieve higher accuracy than using the basic classifiers. Moreover, the experimental results show that the proposed Stacking-SMOTE model outperforms the existing ASD prediction models with approximately 95.5% accuracy.

CONCLUSION

The proposed Stacking-SMOTE model demonstrates that SMOTE is effective in handling the autism imbalanced data. Sequentially, the integration between the gradient boosting and random forest classifier (GBBRF) support to build a robust stacking ensemble model(Stacking-SMOTE).

Collapse

Brůna T, Li H, Guhlin J, Honsel D, Herbold S, Stanke M, Nenasheva N, Ebel M, Gabriel L, Hoff KJ. Galba: genome annotation with miniprot and AUGUSTUS. BMC Bioinformatics 2023;24:327. [PMID: 37653395 PMCID: PMC10472564 DOI: 10.1186/s12859-023-05449-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/21/2023] [Indexed: 09/02/2023] Open

Ismail E, Gad W, Hashem M. HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes. BMC Bioinformatics 2022;23:554. [PMID: 36544099 PMCID: PMC9768984 DOI: 10.1186/s12859-022-05099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open

Zheng Z, Hu H, Gao S, Zhou H, Luo W, Kage U, Liu C, Jia J. Leaf thickness of barley: genetic dissection, candidate genes prediction and its relationship with yield-related traits. Theor Appl Genet 2022;135:1843-1854. [PMID: 35348823 DOI: 10.1007/s00122-022-04076-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 03/07/2022] [Indexed: 06/14/2023]

Abstract

In this first genetic study on assessing leaf thickness directly in cereals, major and environmentally stable QTL were detected in barley and candidate genes underlying a major locus were identified. Leaf thickness (LT) is an important characteristic affecting leaf functions which have been intensively studied. However, as LT has a small dimension in many plant species and technically difficult to measure, previous studies on this characteristic are often based on indirect estimations. In the first study of detecting QTL controlling LT by directly measuring the characteristic in barley, large and stable loci were detected from both field and glasshouse trials conducted in different cropping seasons by assessing a population of 201 recombinant inbred lines. Four loci (locating on chromosome arms 2H, 3H, 5H and 6H, respectively) were consistently detected for flag leaf thickness (FLT) in each of these trials. The one on 6H had the largest effect, with a maximum LOD 9.8 explaining up to 20.9% of phenotypic variance. FLT does not only show strong interactions with flag leaf width and flag leaf area but has also strong correlations with fertile tiller number, spike row types, kernel number per spike and heading date. Though with reduced efficiency, these loci were also detectable from assessing second last leaf of fully grown plants or even from assessing the third leaves of seedlings. Taking advantage of the high-quality genome assemblies for both parents of the mapping population used in this study, three candidate genes underlying the 6H QTL were predicted based on orthologous analysis. These results do not only broaden our understanding on genetic basis of LT and its relationship with other traits in cereal crops but also form the bases for cloning and functional analysis of genes regulating LT in barley.

Collapse

Hurgobin B. Annotation of Protein-Coding Genes in Plant Genomes. Methods Mol Biol 2022;2443:309-326. [PMID: 35037214 DOI: 10.1007/978-1-0716-2067-0_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Manoharan S, Iyyappan OR. A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining. Methods Mol Biol 2022;2496:41-70. [PMID: 35713858 DOI: 10.1007/978-1-0716-2305-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Ye J, Wang S, Yang X, Tang X. Gene prediction of aging-related diseases based on DNN and Mashup. BMC Bioinformatics 2021;22:597. [PMID: 34920719 DOI: 10.1186/s12859-021-04518-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 11/30/2021] [Indexed: 11/17/2022] Open

Abstract

Background

At present, the bioinformatics research on the relationship between aging-related diseases and genes is mainly through the establishment of a machine learning multi-label model to classify each gene. Most of the existing methods for predicting pathogenic genes mainly rely on specific types of gene features, or directly encode multiple features with different dimensions, use the same encoder to concatenate and predict the final results, which will be subject to many limitations in the applicability of the algorithm. Possible shortcomings of the above include: incomplete coverage of gene features by a single type of biomics data, overfitting of small dimensional datasets by a single encoder, or underfitting of larger dimensional datasets.

Methods

We use the known gene disease association data and gene descriptors, such as gene ontology terms (GO), protein interaction data (PPI), PathDIP, Kyoto Encyclopedia of genes and genomes Genes (KEGG), etc, as input for deep learning to predict the association between genes and diseases. Our innovation is to use Mashup algorithm to reduce the dimensionality of PPI, GO and other large biological networks, and add new pathway data in KEGG database, and then combine a variety of biological information sources through modular Deep Neural Network (DNN) to predict the genes related to aging diseases.

Result and conclusion

The results show that our algorithm is more effective than the standard neural network algorithm (the Area Under the ROC curve from 0.8795 to 0.9153), gradient enhanced tree classifier and logistic regression classifier. In this paper, we firstly use DNN to learn the similar genes associated with the known diseases from the complex multi-dimensional feature space, and then provide the evidence that the assumed genes are associated with a certain disease.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04518-5.

Collapse

Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 2021;22:566. [PMID: 34823473 PMCID: PMC8620231 DOI: 10.1186/s12859-021-04482-0] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022] Open

Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, Zhang L. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J 2021;19:6301-6314. [PMID: 34900140 PMCID: PMC8640167 DOI: 10.1016/j.csbj.2021.11.028] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 12/16/2022] Open

Kimbrel JA, Jeffrey BM, Ward CS. Prokaryotic Genome Annotation. Methods Mol Biol 2021;2349:193-214. [PMID: 34718997 DOI: 10.1007/978-1-0716-1585-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]

Yu J, Guo L, Dou X, Jiang W, Qian B, Liu J, Wang J, Wang C, Xu C. Comprehensive evaluation of protein-coding sORFs prediction based on a random sequence strategy. Front Biosci (Landmark Ed) 2021;26:272-278. [PMID: 34455759 DOI: 10.52586/4943] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 06/08/2021] [Accepted: 07/07/2021] [Indexed: 11/09/2022]

Dziurzynski M, Decewicz P, Ciuchcinski K, Gorecki A, Dziewit L. Simple, Reliable, and Time-Efficient Manual Annotation of Bacterial Genomes with MAISEN. Methods Mol Biol 2021;2242:221-9. [PMID: 33961227 DOI: 10.1007/978-1-0716-1099-2_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]

Karimi E, Geslain E, Belcour A, Frioux C, Aïte M, Siegel A, Corre E, Dittami SM. Robustness analysis of metabolic predictions in algal microbial communities based on different annotation pipelines. PeerJ 2021;9:e11344. [PMID: 33996285 PMCID: PMC8106915 DOI: 10.7717/peerj.11344] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 04/03/2021] [Indexed: 01/29/2023] Open

Abstract

Animals, plants, and algae rely on symbiotic microorganisms for their development and functioning. Genome sequencing and genomic analyses of these microorganisms provide opportunities to construct metabolic networks and to analyze the metabolism of the symbiotic communities they constitute. Genome-scale metabolic network reconstructions rest on information gained from genome annotation. As there are multiple annotation pipelines available, the question arises to what extent differences in annotation pipelines impact outcomes of these analyses. Here, we compare five commonly used pipelines (Prokka, MaGe, IMG, DFAST, RAST) from predicted annotation features (coding sequences, Enzyme Commission numbers, hypothetical proteins) to the metabolic network-based analysis of symbiotic communities (biochemical reactions, producible compounds, and selection of minimal complementary bacterial communities). While Prokka and IMG produced the most extensive networks, RAST and DFAST networks produced the fewest false positives and the most connected networks with the fewest dead-end metabolites. Our results underline differences between the outputs of the tested pipelines at all examined levels, with small differences in the draft metabolic networks resulting in the selection of different microbial consortia to expand the metabolic capabilities of the algal host. However, the consortia generated yielded similar predicted producible compounds and could therefore be considered functionally interchangeable. This contrast between selected communities and community functions depending on the annotation pipeline needs to be taken into consideration when interpreting the results of metabolic complementarity analyses. In the future, experimental validation of bioinformatic predictions will likely be crucial to both evaluate and refine the pipelines and needs to be coupled with increased efforts to expand and improve annotations in reference databases.

Collapse

Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM. FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences. BMC Bioinformatics 2021;22:205. [PMID: 33879057 PMCID: PMC8056616 DOI: 10.1186/s12859-021-04120-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 04/07/2021] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Gene annotation in eukaryotes is a non-trivial task that requires meticulous analysis of accumulated transcript data. Challenges include transcriptionally active regions of the genome that contain overlapping genes, genes that produce numerous transcripts, transposable elements and numerous diverse sequence repeats. Currently available gene annotation software applications depend on pre-constructed full-length gene sequence assemblies which are not guaranteed to be error-free. The origins of these sequences are often uncertain, making it difficult to identify and rectify errors in them. This hinders the creation of an accurate and holistic representation of the transcriptomic landscape across multiple tissue types and experimental conditions. Therefore, to gauge the extent of diversity in gene structures, a comprehensive analysis of genome-wide expression data is imperative.

RESULTS

We present FINDER, a fully automated computational tool that optimizes the entire process of annotating genes and transcript structures. Unlike current state-of-the-art pipelines, FINDER automates the RNA-Seq pre-processing step by working directly with raw sequence reads and optimizes gene prediction from BRAKER2 by supplementing these reads with associated proteins. The FINDER pipeline (1) reports transcripts and recognizes genes that are expressed under specific conditions, (2) generates all possible alternatively spliced transcripts from expressed RNA-Seq data, (3) analyzes read coverage patterns to modify existing transcript models and create new ones, and (4) scores genes as high- or low-confidence based on the available evidence across multiple datasets. We demonstrate the ability of FINDER to automatically annotate a diverse pool of genomes from eight species.

CONCLUSIONS

FINDER takes a completely automated approach to annotate genes directly from raw expression data. It is capable of processing eukaryotic genomes of all sizes and requires no manual supervision-ideal for bench researchers with limited experience in handling computational tools.

Collapse

Yang X, Su Y, Wu J, Wan W, Chen H, Cao X, Wang J, Zhang Z, Wang Y, Ma D, Loake GJ, Jiang J. Parallel analysis of global garlic gene expression and alliin content following leaf wounding. BMC Plant Biol 2021;21:174. [PMID: 33838642 PMCID: PMC8035738 DOI: 10.1186/s12870-021-02948-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 03/29/2021] [Indexed: 06/12/2023]

Silva R, Padovani K, Góes F, Alves R. geneRFinder: gene finding in distinct metagenomic data complexities. BMC Bioinformatics 2021;22:87. [PMID: 33632132 PMCID: PMC7905635 DOI: 10.1186/s12859-021-03997-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 02/04/2021] [Indexed: 12/01/2022] Open

Abstract

Background

Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates.

Results

We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval.

Conclusions

We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark.

Collapse

Yadav C, Smith M, Ogunremi D, Yack J. Draft genome assembly and annotation of the masked birch caterpillar, Drepana arcuata (Lepidoptera: Drepanoidea). Data Brief 2020;33:106531. [PMID: 33299908 PMCID: PMC7704289 DOI: 10.1016/j.dib.2020.106531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 11/05/2020] [Accepted: 11/09/2020] [Indexed: 11/12/2022] Open

Meyer C, Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes. BMC Bioinformatics 2020;21:513. [PMID: 33172385 PMCID: PMC7656754 DOI: 10.1186/s12859-020-03855-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/30/2020] [Indexed: 11/10/2022] Open

Zhang Z, Liu L, Kucukoglu M, Tian D, Larkin RM, Shi X, Zheng B. Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences. BMC Genomics 2020;21:709. [PMID: 33045986 PMCID: PMC7552357 DOI: 10.1186/s12864-020-07114-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 09/29/2020] [Indexed: 11/21/2022] Open

Goel N, Singh S, Aseri TC. Global sequence features based translation initiation site prediction in human genomic sequences. Heliyon 2020;6:e04825. [PMID: 32964155 PMCID: PMC7490824 DOI: 10.1016/j.heliyon.2020.e04825] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 05/25/2020] [Accepted: 08/26/2020] [Indexed: 11/26/2022] Open

Iqbal MN, Rasheed MA, Awais M, Chammam W, Kanwal S, Khan SU, Saddick S, Tlili I. BMT: Bioinformatics mini toolbox for comprehensive DNA and protein analysis. Genomics 2020;112:4561-6. [PMID: 32791200 DOI: 10.1016/j.ygeno.2020.08.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/01/2020] [Accepted: 08/07/2020] [Indexed: 01/05/2023]

Ren M, Shi J, Jia J, Guo Y, Ni X, Shi T. Genotype-phenotype correlations of Berardinelli-Seip congenital lipodystrophy and novel candidate genes prediction. Orphanet J Rare Dis 2020;15:108. [PMID: 32349771 PMCID: PMC7191718 DOI: 10.1186/s13023-020-01383-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 04/13/2020] [Indexed: 11/29/2022] Open

Affiliation(s)

Meng Ren Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
Jingru Shi Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
Jinmeng Jia Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
Yongli Guo Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, MOE Key Laboratory of Major Diseases in Children, Beijing Children's Hospital, National Center for Children's Health, Beijing Pediatric Research Institute, Capital Medical University, Beijing, China. Biobank for Clinical Data and Samples in Pediatrics, Beijing Children's Hospital, National Center for Children's Health, Beijing Pediatric Research Institute, Capital Medical University, Beijing, China. Department of Otolaryngology, Head and Neck Surgery, Beijing Children's Hospital, National Center for Children's Health, Capital Medical University, Beijing, China.
Xin Ni Beijing Key Laboratory for Pediatric Diseases of Otolaryngology, Head and Neck Surgery, MOE Key Laboratory of Major Diseases in Children, Beijing Children's Hospital, National Center for Children's Health, Beijing Pediatric Research Institute, Capital Medical University, Beijing, China. Biobank for Clinical Data and Samples in Pediatrics, Beijing Children's Hospital, National Center for Children's Health, Beijing Pediatric Research Institute, Capital Medical University, Beijing, China. Department of Otolaryngology, Head and Neck Surgery, Beijing Children's Hospital, National Center for Children's Health, Capital Medical University, Beijing, China.
Tieliu Shi Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China. National Center for International Research of Biological Targeting Diagnosis and Therapy, Guangxi Key Laboratory of Biological Targeting Diagnosis and Therapy Research, Collaborative Innovation Center for Targeting Tumor Diagnosis and Therapy, Guangxi Medical University, Nanning, 530021, Guangxi, China.

Collapse

Scalzitti N, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics 2020;21:293. [PMID: 32272892 PMCID: PMC7147072 DOI: 10.1186/s12864-020-6707-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/30/2020] [Indexed: 02/02/2023] Open

Gu X, Ding J, Liu W, Yang X, Yao L, Gao X, Zhang M, Yang S, Wen J. Comparative genomics and association analysis identifies virulence genes of Cercospora sojina in soybean. BMC Genomics 2020;21:172. [PMID: 32075575 PMCID: PMC7032006 DOI: 10.1186/s12864-020-6581-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 02/13/2020] [Indexed: 03/01/2023] Open

Abstract

BACKGROUND

Recently, a new strain of Cercospora sojina (Race15) has been identified, which has caused the breakdown of resistance in most soybean cultivars in China. Despite this serious yield reduction, little is known about why this strain is more virulent than others. Therefore, we sequenced the Race15 genome and compared it to the Race1 genome sequence, as its virulence is significantly lower. We then re-sequenced 30 isolates of C. sojina from different regions to identifying differential virulence genes using genome-wide association analysis (GWAS).

RESULTS

The 40.12-Mb Race15 genome encodes 12,607 predicated genes and contains large numbers of gene clusters that have annotations in 11 different common databases. Comparative genomics revealed that although these two genomes had a large number of homologous genes, their genome structures have evolved to introduce 245 specific genes. The most important 5 candidate virulence genes were located on Contig 3 and Contig 1 and were mainly related to the regulation of metabolic mechanisms and the biosynthesis of bioactive metabolites, thereby putatively affecting fungi self-toxicity and reducing host resistance. Our study provides insight into the genomic basis of C. sojina pathogenicity and its infection mechanism, enabling future studies of this disease.

CONCLUSIONS

Via GWAS, we identified five candidate genes using three different methods, and these candidate genes are speculated to be related to metabolic mechanisms and the biosynthesis of bioactive metabolites. Meanwhile, Race15 specific genes may be linked with high virulence. The genes highly prevalent in virulent isolates should also be proposed as candidates, even though they were not found in our SNP analysis. Future work should focus on using a larger sample size to confirm and refine candidate gene identifications and should study the functional roles of these candidates, in order to investigate their potential roles in C. sojina pathogenicity.

Collapse

Herndon N, Shelton J, Gerischer L, Ioannidis P, Ninova M, Dönitz J, Waterhouse RM, Liang C, Damm C, Siemanowski J, Kitzmann P, Ulrich J, Dippel S, Oberhofer G, Hu Y, Schwirz J, Schacht M, Lehmann S, Montino A, Posnien N, Gurska D, Horn T, Seibert J, Vargas Jentzsch IM, Panfilio KA, Li J, Wimmer EA, Stappert D, Roth S, Schröder R, Park Y, Schoppmeier M, Chung HR, Klingler M, Kittelmann S, Friedrich M, Chen R, Altincicek B, Vilcinskas A, Zdobnov E, Griffiths-Jones S, Ronshaugen M, Stanke M, Brown SJ, Bucher G. Enhanced genome assembly and a new official gene set for Tribolium castaneum. BMC Genomics 2020;21:47. [PMID: 31937263 PMCID: PMC6961396 DOI: 10.1186/s12864-019-6394-6] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 12/12/2019] [Indexed: 12/17/2022] Open

Abstract

Background

The red flour beetle Tribolium castaneum has emerged as an important model organism for the study of gene function in development and physiology, for ecological and evolutionary genomics, for pest control and a plethora of other topics. RNA interference (RNAi), transgenesis and genome editing are well established and the resources for genome-wide RNAi screening have become available in this model. All these techniques depend on a high quality genome assembly and precise gene models. However, the first version of the genome assembly was generated by Sanger sequencing, and with a small set of RNA sequence data limiting annotation quality.

Results

Here, we present an improved genome assembly (Tcas5.2) and an enhanced genome annotation resulting in a new official gene set (OGS3) for Tribolium castaneum, which significantly increase the quality of the genomic resources. By adding large-distance jumping library DNA sequencing to join scaffolds and fill small gaps, the gaps in the genome assembly were reduced and the N50 increased to 4753kbp. The precision of the gene models was enhanced by the use of a large body of RNA-Seq reads of different life history stages and tissue types, leading to the discovery of 1452 novel gene sequences. We also added new features such as alternative splicing, well defined UTRs and microRNA target predictions. For quality control, 399 gene models were evaluated by manual inspection. The current gene set was submitted to Genbank and accepted as a RefSeq genome by NCBI.

Conclusions

The new genome assembly (Tcas5.2) and the official gene set (OGS3) provide enhanced genomic resources for genetic work in Tribolium castaneum. The much improved information on transcription start sites supports transgenic and gene editing approaches. Further, novel types of information such as splice variants and microRNA target genes open additional possibilities for analysis.

Collapse

Affiliation(s)

Nicolae Herndon Department of Computer Science, East Carolina University, Greenville, NC, 27858, USA
Jennifer Shelton Division of Biology, Kansas State University, Manhattan, KS, 66506, USA
Lizzy Gerischer Institut für Mathematik und Informatik, Universität Greifswald, Greifswald, Germany
Panos Ioannidis Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211, Geneva, Switzerland
Maria Ninova Faculty of Biology, Medicine and Health, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
Jürgen Dönitz Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Robert M Waterhouse Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
Chun Liang Department of Biology, Miami University, Oxford, OH, 45056, USA
Carsten Damm Institut für Informatik, Fakultät für Mathematik und Informatik, Georg-August-Universität Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
Janna Siemanowski Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Peter Kitzmann Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Julia Ulrich Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Stefan Dippel Göttinger Graduiertenschule fur Neurowissenschaften Biophysik und Molekulare Biowissenschaften, Georg-August-Universität Göttingen, Göttingen, Germany
Georg Oberhofer Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Yonggang Hu Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Jonas Schwirz Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Magdalena Schacht Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Sabrina Lehmann Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Alice Montino Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Nico Posnien Department of Developmental Biology, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Daniela Gurska Institute for Zoology: Developmental Biology, University of Cologne, Zülpicher Str. 47b, 50674, Cologne, Germany
Thorsten Horn Institute for Zoology: Developmental Biology, University of Cologne, Zülpicher Str. 47b, 50674, Cologne, Germany
Jan Seibert Institute for Zoology: Developmental Biology, University of Cologne, Zülpicher Str. 47b, 50674, Cologne, Germany
Iris M Vargas Jentzsch Institute for Zoology: Developmental Biology, University of Cologne, Zülpicher Str. 47b, 50674, Cologne, Germany
Kristen A Panfilio School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL, UK
Jianwei Li Department Developmental Biology, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Ernst A Wimmer Department of Developmental Biology, University of Göttingen, Justus-von-Liebig-Weg 11, 37077, Göttingen, Germany
Dominik Stappert Institute of Zoology: Developmental Biology, University of Cologne, Zülpicher Weg 47b, 50674, Cologne, Germany
Siegfried Roth Institute of Zoology: Developmental Biology, University of Cologne, Zülpicher Weg 47b, 50674, Cologne, Germany
Reinhard Schröder Institut für Biowissenschaften, Universität Rostock, Albert-Einstein-Str. 3, 18059, Rostock, Germany
Yoonseong Park Department of Entomology, Kansas State University, Manhattan, KS, 66506, USA
Michael Schoppmeier Department of Biology, Divison of Developmental Biology, Friedrich-Alexander-University of Erlangen-Nürnberg, Staudtstr. 5, 91058, Erlangen, Germany
Ho-Ryun Chung Department of Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Ihnenstraße 63-73, 14195, Berlin, Germany
Martin Klingler Department of Biology, Division of Developmental Biology, Friedrich-Alexander-University of Erlangen-Nürnberg, Staudtstr. 5, 91058, Erlangen, Germany
Sebastian Kittelmann Oxford Brookes University, Centre for Functional Genomics, Gipsy Lane, Oxford, OX3 0BP, UK
Markus Friedrich Department of Anatomy and Cell Biology, Wayne State University, Detroit, MI, 48202, USA
Rui Chen Baylor College of Medicine, Houston, Texas, USA
Boran Altincicek Institute of Crop Science and Resource Conservation (INRES-Phytomedicine), Rheinische Friedrich-Wilhelms-University of Bonn, Bonn, Germany
Andreas Vilcinskas Institute for Insect Biotechnology, Justus-Liebig University of Giessen, Heinrich-Buff-Ring 26-32, 35392, Giessen, Germany
Evgeny Zdobnov Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211, Geneva, Switzerland
Sam Griffiths-Jones Faculty of Biology, Medicine and Health, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
Matthew Ronshaugen Faculty of Biology, Medicine and Health, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
Mario Stanke Institut für Mathematik und Informatik, Universität Greifswald, Greifswald, Germany.
Sue J Brown Division of Biology, Kansas State University, Manhattan, KS, 66506, USA.
Gregor Bucher Georg-August-Universität Göttingen, Göttingen, Germany.

Collapse

Al-Ajlan A, El Allali A. CNN-MGP: Convolutional Neural Networks for Metagenomics Gene Prediction. Interdiscip Sci 2019;11:628-635. [PMID: 30588558 PMCID: PMC6841655 DOI: 10.1007/s12539-018-0313-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 11/22/2018] [Accepted: 12/07/2018] [Indexed: 12/30/2022]

Wilbrandt J, Misof B, Panfilio KA, Niehuis O. Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genomics 2019;20:753. [PMID: 31623555 PMCID: PMC6798390 DOI: 10.1186/s12864-019-6064-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/27/2019] [Indexed: 02/06/2023] Open

Abstract

Background

The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative.

Results

Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities.

Conclusions

In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.

Collapse

Wolf DC, Cryder Z, Gan J. Soil bacterial community dynamics following surfactant addition and bioaugmentation in pyrene-contaminated soils. Chemosphere 2019;231:93-102. [PMID: 31128356 DOI: 10.1016/j.chemosphere.2019.05.145] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 05/15/2019] [Accepted: 05/17/2019] [Indexed: 06/09/2023]

Caballero M, Wegrzyn J. gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks. Genomics Proteomics Bioinformatics 2019;17:305-310. [PMID: 31437583 PMCID: PMC6818179 DOI: 10.1016/j.gpb.2019.04.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 03/21/2019] [Accepted: 04/29/2019] [Indexed: 11/26/2022]

Schiavinato M, Strasser R, Mach L, Dohm JC, Himmelbauer H. Genome and transcriptome characterization of the glycoengineered Nicotiana benthamiana line ΔXT/FT. BMC Genomics 2019;20:594. [PMID: 31324144 PMCID: PMC6642603 DOI: 10.1186/s12864-019-5960-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/08/2019] [Indexed: 01/21/2023] Open

Meher PK, Sahu TK, Gahoi S, Satpathy S, Rao AR. Evaluating the performance of sequence encoding schemes and machine learning methods for splice sites recognition. Gene 2019;705:113-126. [PMID: 31009682 DOI: 10.1016/j.gene.2019.04.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 03/27/2019] [Accepted: 04/17/2019] [Indexed: 02/02/2023]

Abstract

Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.

Collapse

Kawamoto M, Jouraku A, Toyoda A, Yokoi K, Minakuchi Y, Katsuma S, Fujiyama A, Kiuchi T, Yamamoto K, Shimada T. High-quality genome assembly of the silkworm, Bombyx mori. Insect Biochem Mol Biol 2019;107:53-62. [PMID: 30802494 DOI: 10.1016/j.ibmb.2019.02.002] [Citation(s) in RCA: 147] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 02/13/2019] [Accepted: 02/18/2019] [Indexed: 05/21/2023]

Affiliation(s)

Munetaka Kawamoto Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
Akiya Jouraku Institute of Agrobiological Sciences, National Agriculture and Food Research Organization (NARO), 1-2 Owashi, Tsukuba, Ibaraki, 305-8634, Japan
Atsushi Toyoda Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan; Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
Kakeru Yokoi Institute of Agrobiological Sciences, National Agriculture and Food Research Organization (NARO), 1-2 Owashi, Tsukuba, Ibaraki, 305-8634, Japan
Yohei Minakuchi Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
Susumu Katsuma Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
Asao Fujiyama Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan; Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
Takashi Kiuchi Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.
Kimiko Yamamoto Institute of Agrobiological Sciences, National Agriculture and Food Research Organization (NARO), 1-2 Owashi, Tsukuba, Ibaraki, 305-8634, Japan.
Toru Shimada Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan.

Collapse

Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. Whole-Genome Annotation with BRAKER. Methods Mol Biol 2019;1962:65-95. [PMID: 31020555 PMCID: PMC6635606 DOI: 10.1007/978-1-4939-9173-0_5] [Citation(s) in RCA: 280] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Chan PP, Lowe TM. tRNAscan-SE: Searching for tRNA Genes in Genomic Sequences. Methods Mol Biol 2019;1962:1-14. [PMID: 31020551 DOI: 10.1007/978-1-4939-9173-0_1] [Citation(s) in RCA: 758] [Impact Index Per Article: 151.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Keilwagen J, Hartung F, Grau J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol Biol 2019;1962:161-177. [PMID: 31020559 DOI: 10.1007/978-1-4939-9173-0_9] [Citation(s) in RCA: 118] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]

Nachtweide S, Stanke M. Multi-Genome Annotation with AUGUSTUS. Methods Mol Biol 2019;1962:139-160. [PMID: 31020558 DOI: 10.1007/978-1-4939-9173-0_8] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

König S, Romoth L, Stanke M. Comparative Genome Annotation. Methods Mol Biol 2018;1704:189-212. [PMID: 29277866 DOI: 10.1007/978-1-4939-7463-4_6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Grouzdev DS, Tourova TP, Babich TL, Shevchenko MA, Sokolova DS, Abdullin RR, Poltaraus AB, Toshchakov SV, Nazina TN. Whole-genome sequence data and analysis of type strains 'Pusillimonas nitritireducens' and 'Pusillimonas subterraneus' isolated from nitrate- and radionuclide-contaminated groundwater in Russia. Data Brief 2018;21:882-887. [PMID: 30426040 PMCID: PMC6222257 DOI: 10.1016/j.dib.2018.10.060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Revised: 10/11/2018] [Accepted: 10/17/2018] [Indexed: 12/01/2022] Open

Al-Ajlan A, El Allali A. Feature selection for gene prediction in metagenomic fragments. BioData Min 2018;11:9. [PMID: 30026811 PMCID: PMC6047368 DOI: 10.1186/s13040-018-0170-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2018] [Accepted: 05/01/2018] [Indexed: 12/14/2022] Open

Zorio DAR, Monsma S, Sanes DH, Golding NL, Rubel EW, Wang Y. De novo sequencing and initial annotation of the Mongolian gerbil (Meriones unguiculatus) genome. Genomics 2018. [PMID: 29526484 DOI: 10.1016/j.ygeno.2018.03.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Höps W, Jeffryes M, Bateman A. Gene Unprediction with Spurio: A tool to identify spurious protein sequences. F1000Res 2018;7:261. [PMID: 29721311 PMCID: PMC5897793 DOI: 10.12688/f1000research.14050.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/27/2018] [Indexed: 11/20/2022] Open

Gschloessl B, Dorkeld F, Audiot P, Bretaudeau A, Kerdelhué C, Streiff R. De novo genome and transcriptome resources of the Adzuki bean borer Ostrinia scapulalis (Lepidoptera: Crambidae). Data Brief 2018;17:781-787. [PMID: 29785409 PMCID: PMC5958680 DOI: 10.1016/j.dib.2018.01.073] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 01/23/2018] [Accepted: 01/25/2018] [Indexed: 11/25/2022] Open

Orgeur M, Martens M, Börno ST, Timmermann B, Duprez D, Stricker S. A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model. Biol Open 2018;7:bio.028498. [PMID: 29183907 PMCID: PMC5827264 DOI: 10.1242/bio.028498] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open

Reid I. Evaluating Programs for Predicting Genes and Transcripts with RNA-Seq Support in Fungal Genomes. Methods Mol Biol 2018;1775:209-227. [PMID: 29876820 DOI: 10.1007/978-1-4939-7804-5_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Southey BR, Romanova EV, Rodriguez-Zas SL, Sweedler JV. Bioinformatics for Prohormone and Neuropeptide Discovery. Methods Mol Biol 2018;1719:71-96. [PMID: 29476505 DOI: 10.1007/978-1-4939-7537-2_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]

Haridas S, Salamov A, Grigoriev IV. Fungal Genome Annotation. Methods Mol Biol 2018;1775:171-184. [PMID: 29876818 DOI: 10.1007/978-1-4939-7804-5_15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]