1
|
Haddadi K, Ahmed Barghout R, Mahadevan R. KinMod database: a tool for investigating metabolic regulation. Database (Oxford) 2022; 2022:6759124. [PMID: 36222201 PMCID: PMC9554645 DOI: 10.1093/database/baac081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 08/08/2022] [Accepted: 10/07/2022] [Indexed: 11/05/2022]
Abstract
The ability of current kinetic models to simulate the phenotypic behaviour of cells is limited since cell metabolism is regulated at different levels including enzyme regulation. The small molecule regulation network (SMRN) enables cells to respond rapidly to environmental fluctuations by controlling the activity of enzymes in metabolic pathways. However, SMRN is not as well studied relative to metabolic networks. The main contributor to the lack of knowledge on this regulatory system is the sparsity of experimental data and the absence of a standard framework for representing available information. In this paper, we introduce the KinMod database that encompasses more than 2 million data points on the metabolism and metabolic regulation network of 9814 organisms KinMod database employs a hierarchical data structure to: (i) signify relationships between kinetic information obtained through in-vitro experiments and proteins, with an emphasis on SMRN, (ii) provide a thorough insight into available kinetic parameters and missing experimental measurements of this regulatory network and (iii) facilitate machine learning approaches for parameter estimation and accurate kinetic model construction by providing a homogeneous list of linked omics data. The hierarchical ontology of the KinMod database allows flexible exploration of data attributes and investigation of metabolic relationships within- and cross-species. Identifying missing experimental values suggests additional experiments required for kinetic parameter estimation. Linking multi-omics data and providing data on SMRN encourages the development of novel machine learning techniques for predicting missing kinetic parameters and promotes accurate kinetic model construction of cells metabolism by providing a comprehensive list of available kinetic measurements. To illustrate the value of KinMod data, we develop six analyses to visualize associations between data classes belonging to separate sections of the metabolism. Through these analyses, we demonstrate that the KinMod database provides a unique framework for biologists and engineers to retrieve, evaluate and compare the functional metabolism of species, including the regulatory network, and discover the extent of available and missing experimental values of the metabolic regulation. Database URL: https://lmse.utoronto.ca/kinmod/KINMOD.sql.gz
Collapse
Affiliation(s)
- Kiandokht Haddadi
- Laboratory for Metabolic Systems Engineering, BioZone, Center for Applied Biosciences and Bioengineering, Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON M5T 3A1, Canada
| | - Rana Ahmed Barghout
- *Correspondence to: Rana Ahmed Barghout Laboratory for Metabolic Systems Engineering, BioZone, Center for Applied Biosciences and Bioengineering, Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON M5T 3A1, Canada
| | - Radhakrishnan Mahadevan
- Laboratory for Metabolic Systems Engineering, BioZone, Center for Applied Biosciences and Bioengineering, Department of Chemical Engineering & Applied Chemistry, University of Toronto, 200 College St, Toronto, ON M5T 3A1, Canada
| |
Collapse
|
2
|
Sublethal HPH treatment is a sustainable tool that induces autolytic-like processes in the early gene expression of Saccharomyces cerevisiae. Food Res Int 2022; 159:111589. [DOI: 10.1016/j.foodres.2022.111589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 06/23/2022] [Accepted: 06/24/2022] [Indexed: 11/21/2022]
|
3
|
James K, Alsobhe A, Cockell SJ, Wipat A, Pocock M. Integration of probabilistic functional networks without an external Gold Standard. BMC Bioinformatics 2022; 23:302. [PMID: 35879662 PMCID: PMC9316706 DOI: 10.1186/s12859-022-04834-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Probabilistic functional integrated networks (PFINs) are designed to aid our understanding of cellular biology and can be used to generate testable hypotheses about protein function. PFINs are generally created by scoring the quality of interaction datasets against a Gold Standard dataset, usually chosen from a separate high-quality data source, prior to their integration. Use of an external Gold Standard has several drawbacks, including data redundancy, data loss and the need for identifier mapping, which can complicate the network build and impact on PFIN performance. Additionally, there typically are no Gold Standard data for non-model organisms. RESULTS We describe the development of an integration technique, ssNet, that scores and integrates both high-throughput and low-throughout data from a single source database in a consistent manner without the need for an external Gold Standard dataset. Using data from Saccharomyces cerevisiae we show that ssNet is easier and faster, overcoming the challenges of data redundancy, Gold Standard bias and ID mapping. In addition ssNet results in less loss of data and produces a more complete network. CONCLUSIONS The ssNet method allows PFINs to be built successfully from a single database, while producing comparable network performance to networks scored using an external Gold Standard source and with reduced data loss.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Sandyford Rd, Newcastle upon Tyne, NE1 8ST, UK. .,Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.
| | - Aoesha Alsobhe
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.,Saudi Electronic University, Abi Bakr As Siddiq Branch Rd, Riyadh, 1332, Saudi Arabia
| | - Simon J Cockell
- School of Biomedical, Nutritional and Sports Science, Faculty of Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| | - Matthew Pocock
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| |
Collapse
|
4
|
Kachroo AH, Vandeloo M, Greco BM, Abdullah M. Humanized yeast to model human biology, disease and evolution. Dis Model Mech 2022; 15:275614. [PMID: 35661208 PMCID: PMC9194483 DOI: 10.1242/dmm.049309] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
For decades, budding yeast, a single-cellular eukaryote, has provided remarkable insights into human biology. Yeast and humans share several thousand genes despite morphological and cellular differences and over a billion years of separate evolution. These genes encode critical cellular processes, the failure of which in humans results in disease. Although recent developments in genome engineering of mammalian cells permit genetic assays in human cell lines, there is still a need to develop biological reagents to study human disease variants in a high-throughput manner. Many protein-coding human genes can successfully substitute for their yeast equivalents and sustain yeast growth, thus opening up doors for developing direct assays of human gene function in a tractable system referred to as 'humanized yeast'. Humanized yeast permits the discovery of new human biology by measuring human protein activity in a simplified organismal context. This Review summarizes recent developments showing how humanized yeast can directly assay human gene function and explore variant effects at scale. Thus, by extending the 'awesome power of yeast genetics' to study human biology, humanizing yeast reinforces the high relevance of evolutionarily distant model organisms to explore human gene evolution, function and disease.
Collapse
|
5
|
Meng X, Xiang J, Zheng R, Wu FX, Li M. DPCMNE: Detecting Protein Complexes From Protein-Protein Interaction Networks Via Multi-Level Network Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1592-1602. [PMID: 33417563 DOI: 10.1109/tcbb.2021.3050102] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Biological functions of a cell are typically carried out through protein complexes. The detection of protein complexes is therefore of great significance for understanding the cellular organizations and protein functions. In the past decades, many computational methods have been proposed to detect protein complexes. However, most of the existing methods just search the local topological information to mine dense subgraphs as protein complexes, ignoring the global topological information. To tackle this issue, we propose the DPCMNE method to detect protein complexes via multi-level network embedding. It can preserve both the local and global topological information of biological networks. First, DPCMNE employs a hierarchical compressing strategy to recursively compress the input protein-protein interaction (PPI) network into multi-level smaller PPI networks. Then, a network embedding method is applied on these smaller PPI networks to learn protein embeddings of different levels of granularity. The embeddings learned from all the compressed PPI networks are concatenated to represent the final protein embeddings of the original input PPI network. Finally, a core-attachment based strategy is adopted to detect protein complexes in the weighted PPI network constructed by the pairwise similarity of protein embeddings. To assess the efficiency of our proposed method, DPCMNE is compared with other eight clustering algorithms on two yeast datasets. The experimental results show that the performance of DPCMNE outperforms those state-of-the-art complex detection methods in terms of F1 and F1+Acc. Furthermore, the results of functional enrichment analysis indicate that protein complexes detected by DPCMNE are more biologically significant in terms of P-score.
Collapse
|
6
|
Noori S, Al‐A'araji N, Al‐Shamery E. Construction of dynamic protein interaction network based on gene expression data and quartile one principle. Proteins 2022; 90:1219-1228. [DOI: 10.1002/prot.26304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/12/2022] [Accepted: 01/13/2022] [Indexed: 11/09/2022]
Affiliation(s)
- Soheir Noori
- Software Department University of Babylon Hillah Babylon Iraq
- Computer Science Department University of Kerbala Karbala Iraq
| | | | - Eman Al‐Shamery
- Software Department University of Babylon Hillah Babylon Iraq
| |
Collapse
|
7
|
Engel SR, Wong ED, Nash RS, Aleksander S, Alexander M, Douglass E, Karra K, Miyasato SR, Simison M, Skrzypek MS, Weng S, Cherry JM. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources. Genetics 2022; 220:iyab224. [PMID: 34897464 PMCID: PMC9209811 DOI: 10.1093/genetics/iyab224] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.
Collapse
Affiliation(s)
- Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Micheal Alexander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Eric Douglass
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Matt Simison
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
8
|
Zhong J, Tang C, Peng W, Xie M, Sun Y, Tang Q, Xiao Q, Yang J. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinformatics 2021; 22:248. [PMID: 33985429 PMCID: PMC8120700 DOI: 10.1186/s12859-021-04175-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 05/06/2021] [Indexed: 02/08/2023] Open
Abstract
Background Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins. Results In this paper, we propose a new measure named JDC, which is based on the PPI network data and gene expression data. The JDC method offers a dynamic threshold method to binarize gene expression data. After that, it combines the degree centrality and Jaccard similarity index to calculate the JDC score for each protein in the PPI network. We benchmark the JDC method on four organisms respectively, and evaluate our method by using ROC analysis, modular analysis, jackknife analysis, overlapping analysis, top analysis, and accuracy analysis. The results show that the performance of JDC is better than DC, IC, EC, SC, BC, CC, NC, PeC, and WDC. We compare JDC with both NF-PIN and TS-PIN methods, which predict essential proteins through active PPI networks constructed from dynamic gene expression. Conclusions We demonstrate that the new centrality measure, JDC, is more efficient than state-of-the-art prediction methods with same input. The main ideas behind JDC are as follows: (1) Essential proteins are generally densely connected clusters in the PPI network. (2) Binarizing gene expression data can screen out fluctuations in gene expression profiles. (3) The essentiality of the protein depends on the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network.
Collapse
Affiliation(s)
- Jiancheng Zhong
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Changsha, 410083, China
| | - Chao Tang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Wei Peng
- College of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
| | - Minzhu Xie
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Yusui Sun
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Qiang Tang
- College of Engineering and Design, Hunan Normal University, Changsha, 410081, China
| | - Qiu Xiao
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| | - Jiahong Yang
- School of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
9
|
Different Routes of Protein Folding Contribute to Improved Protein Production in Saccharomyces cerevisiae. mBio 2020; 11:mBio.02743-20. [PMID: 33173005 PMCID: PMC7667031 DOI: 10.1128/mbio.02743-20] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Protein folding plays an important role in protein maturation and secretion. In recombinant protein production, many studies have focused on the folding pathway to improve productivity. Here, we identified two different routes for improving protein production by yeast. We found that improving folding precision is a better strategy. Dysfunction of this process is also associated with several aberrant protein-associated human diseases. Here, our findings about the role of glucosidase Cwh41p in the precision control system and the characterization of the strain with a more precise folding process could contribute to the development of novel therapeutic strategies. Protein folding is often considered the flux controlling process in protein synthesis and secretion. Here, two previously isolated Saccharomyces cerevisiae strains with increased α-amylase productivity were analyzed in chemostat cultures at different dilution rates using multi-omics data. Based on the analysis, we identified different routes of the protein folding pathway to improve protein production. In the first strain, the increased abundance of proteins working on the folding process, coordinated with upregulated glycogen metabolism and trehalose metabolism, helped increase α-amylase productivity 1.95-fold compared to the level in the original strain in chemostat culture at a dilution rate of 0.2/h. The second strain further strengthened the folding precision to improve protein production. More precise folding helps the cell improve protein production efficiency and reduce the expenditure of energy on the handling of misfolded proteins. As calculated using an enzyme-constrained genome-scale metabolic model, the second strain had an increased productivity of 2.36-fold with lower energy expenditure than that of the original under the same condition. Further study revealed that the regulation of N-glycans played an important role in the folding precision control and that overexpression of the glucosidase Cwh41p can significantly improve protein production, especially for the strains with improved folding capacity but lower folding precision. Our findings elucidated in detail the mechanisms in two strains having improved protein productivity and thereby provided novel insights for industrial recombinant protein production as well as demonstrating how multi-omics analysis can be used for identification of novel strain-engineering targets.
Collapse
|
10
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
Zeng X, Lin Y, He Y, Lu L, Min X, Rodriguez-Paton A. Deep Collaborative Filtering for Prediction of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1639-1647. [PMID: 30932845 DOI: 10.1109/tcbb.2019.2907536] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurate prioritization of potential disease genes is a fundamental challenge in biomedical research. Various algorithms have been developed to solve such problems. Inductive Matrix Completion (IMC) is one of the most reliable models for its well-established framework and its superior performance in predicting gene-disease associations. However, the IMC method does not hierarchically extract deep features, which might limit the quality of recovery. In this case, the architecture of deep learning, which obtains high-level representations and handles noises and outliers presented in large-scale biological datasets, is introduced into the side information of genes in our Deep Collaborative Filtering (DCF) model. Further, for lack of negative examples, we also exploit Positive-Unlabeled (PU) learning formulation to low-rank matrix completion. Our approach achieves substantially improved performance over other state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database. Our approach is 10 percent more efficient than standard IMC in detecting a true association, and significantly outperforms other alternatives in terms of the precision-recall metric at the top-k predictions. Moreover, we also validate the disease with no previously known gene associations and newly reported OMIM associations. The experimental results show that DCF is still satisfactory for ranking novel disease phenotypes as well as mining unexplored relationships. The source code and the data are available at https://github.com/xzenglab/DCF.
Collapse
|
12
|
Wekesa JS, Luan Y, Meng J. Predicting Protein Functions Based on Differential Co-expression and Neighborhood Analysis. J Comput Biol 2020; 28:1-18. [PMID: 32302512 DOI: 10.1089/cmb.2019.0120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Proteins are polypeptides essential in biological processes. Protein physical interactions are complemented by other types of functional relationship data including genetic interactions, knowledge about co-expression, and evolutionary pathways. Existing algorithms integrate protein interaction and gene expression data to retrieve context-specific subnetworks composed of genes/proteins with known and unknown functions. However, most protein function prediction algorithms fail to exploit diverse intrinsic information in feature and label spaces. We develop a novel integrative method based on differential Co-expression analysis and Neighbor-voting algorithm for Protein Function Prediction, namely CNPFP. The method integrates heterogeneous data and exploits intrinsic and latent linkages via global iterative approach and genomic features. CNPFP performs three tasks: clustering, differential co-expression analysis, and predicts protein functions. Our aim is to identify yeast cell cycle-specific proteins linked to differentially expressed proteins in the protein-protein interaction network. To capture intrinsic information, CNPFP selects the most relevant feature subset based on global iterative neighbor-voting algorithm. We identify eight condition-specific modules. The most relevant subnetwork has 87 genes highly enriched with cyclin-dependent kinases, a protein kinase relevant for cell cycle regulation. We present comprehensive annotations for 3538 Saccharomyces cerevisiae proteins. Our method achieves an AUROC of 0.9862, accuracy of 0.9710, and F-score of 0.9691. From the results, we can summarize that exploiting intrinsic nature of protein relationships improves the quality of function prediction. Thus, the proposed method is useful in functional genomics studies.
Collapse
Affiliation(s)
- Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
- School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - Yushi Luan
- School of Life Science and Biotechnology, Dalian University of Technology, Dalian, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, China
| |
Collapse
|
13
|
Genome-wide identification and characterization of R2R3-MYB family in Hypericum perforatum under diverse abiotic stresses. Int J Biol Macromol 2020; 145:341-354. [DOI: 10.1016/j.ijbiomac.2019.12.100] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 11/17/2019] [Accepted: 12/12/2019] [Indexed: 12/11/2022]
|
14
|
Wang R, Liu G, Wang C. Identifying protein complexes based on an edge weight algorithm and core-attachment structure. BMC Bioinformatics 2019; 20:471. [PMID: 31521132 PMCID: PMC6744658 DOI: 10.1186/s12859-019-3007-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/26/2019] [Indexed: 02/02/2023] Open
Abstract
Background Protein complex identification from protein-protein interaction (PPI) networks is crucial for understanding cellular organization principles and functional mechanisms. In recent decades, numerous computational methods have been proposed to identify protein complexes. However, most of the current state-of-the-art studies still have some challenges to resolve, including their high false-positives rates, incapability of identifying overlapping complexes, lack of consideration for the inherent organization within protein complexes, and absence of some biological attachment proteins. Results In this paper, to overcome these limitations, we present a protein complex identification method based on an edge weight method and core-attachment structure (EWCA) which consists of a complex core and some sparse attachment proteins. First, we propose a new weighting method to assess the reliability of interactions. Second, we identify protein complex cores by using the structural similarity between a seed and its direct neighbors. Third, we introduce a new method to detect attachment proteins that is able to distinguish and identify peripheral proteins and overlapping proteins. Finally, we bind attachment proteins to their corresponding complex cores to form protein complexes and discard redundant protein complexes. The experimental results indicate that EWCA outperforms existing state-of-the-art methods in terms of both accuracy and p-value. Furthermore, EWCA could identify many more protein complexes with statistical significance. Additionally, EWCA could have better balance accuracy and efficiency than some state-of-the-art methods with high accuracy. Conclusions In summary, EWCA has better performance for protein complex identification by a comprehensive comparison with twelve algorithms in terms of different evaluation metrics. The datasets and software are freely available for academic research at https://github.com/RongquanWang/EWCA.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037, China
| |
Collapse
|
15
|
Sahu PK, Salim S, Pp M, Chauhan S, Tomar RS. Reverse genetic analysis of yeast YPR099C/MRPL51 reveals a critical role of both overlapping ORFs in respiratory growth and MRPL51 in mitochondrial DNA maintenance. FEMS Yeast Res 2019; 19:5543219. [PMID: 31374566 DOI: 10.1093/femsyr/foz056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 08/01/2019] [Indexed: 11/14/2022] Open
Abstract
The Saccharomyces cerevisiae genome contains 6572 ORFs, of which 680 ORFs are classified as dubious ORFs. A dubious ORF is a small, noncoding, nonconserved ORF that overlaps with another ORF of the complementary strand. Our study characterizes a dubious/nondubious ORF pair, YPR099C/MRPL51, and shows the transcript and protein level expression of YPR099C. Its subcellular localization was observed in the mitochondria. The overlapping ORF, MRPL51, encodes a mitochondrial ribosomal protein of large subunit. Deletion of any ORF from YPR099C/MRPL51 pair induces common phenotypes, i.e. loss of mtDNA, lack of mitochondrial fusion and lack of respiratory growth, due to the double deletion (ypr099cΔ/Δmrpl51Δ/Δ) caused by sequence overlap. Hence, we created the single deletions of each ORF of the YPR099C/MRPL51 pair by an alternative approach to distinguish their phenotypes and identify the specific functions. Both the ORFs were found essential for the functional mitochondria and respiratory growth, but MRPL51 showed its specific requirement in mtDNA stability. The mechanism of mtDNA maintenance by Mrpl51 is probably Mhr1 dependent that physically interacts with Mrpl51 and also regulates mtDNA repair. Overall, our study provides strong evidence for the protein level expression of a dubious ORF YPR099C and the bifunctional role of Mrpl51 in mtDNA maintenance.
Collapse
Affiliation(s)
- Pushpendra Kumar Sahu
- Laboratory of Chromatin Biology, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal 462066, Madhya Pradesh, India
| | - Sagar Salim
- Laboratory of Chromatin Biology, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal 462066, Madhya Pradesh, India
| | - Mubthasima Pp
- Laboratory of Chromatin Biology, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal 462066, Madhya Pradesh, India
| | - Sakshi Chauhan
- Laboratory of Chromatin Biology, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal 462066, Madhya Pradesh, India
| | - Raghuvir Singh Tomar
- Laboratory of Chromatin Biology, Department of Biological Sciences, Indian Institute of Science Education and Research Bhopal, Bhopal 462066, Madhya Pradesh, India
| |
Collapse
|
16
|
Wang R, Wang C, Sun L, Liu G. A seed-extended algorithm for detecting protein complexes based on density and modularity with topological structure and GO annotations. BMC Genomics 2019; 20:637. [PMID: 31390979 PMCID: PMC6686515 DOI: 10.1186/s12864-019-5956-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 07/04/2019] [Indexed: 12/28/2022] Open
Abstract
Background The detection of protein complexes is of great significance for researching mechanisms underlying complex diseases and developing new drugs. Thus, various computational algorithms have been proposed for protein complex detection. However, most of these methods are based on only topological information and are sensitive to the reliability of interactions. As a result, their performance is affected by false-positive interactions in PPINs. Moreover, these methods consider only density and modularity and ignore protein complexes with various densities and modularities. Results To address these challenges, we propose an algorithm to exploit protein complexes in PPINs by a Seed-Extended algorithm based on Density and Modularity with Topological structure and GO annotations, named SE-DMTG to improve the accuracy of protein complex detection. First, we use common neighbors and GO annotations to construct a weighted PPIN. Second, we define a new seed selection strategy to select seed nodes. Third, we design a new fitness function to detect protein complexes with various densities and modularities. We compare the performance of SE-DMTG with that of thirteen state-of-the-art algorithms on several real datasets. Conclusion The experimental results show that SE-DMTG not only outperforms some classical algorithms in yeast PPINs in terms of the F-measure and Jaccard but also achieves an ideal performance in terms of functional enrichment. Furthermore, we apply SE-DMTG to PPINs of several other species and demonstrate the outstanding accuracy and matching ratio in detecting protein complexes compared with other algorithms.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037, China
| | - Liyan Sun
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
| |
Collapse
|
17
|
Haque M, Sarmah R, Bhattacharyya DK. A common neighbor based technique to detect protein complexes in PPI networks. J Genet Eng Biotechnol 2019; 16:227-238. [PMID: 30647726 PMCID: PMC6296598 DOI: 10.1016/j.jgeb.2017.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Revised: 09/26/2017] [Accepted: 10/05/2017] [Indexed: 01/15/2023]
Abstract
Detection of protein complexes by analyzing and understanding PPI networks is an important task and critical to all aspects of cell biology. We present a technique called PROtein COmplex DEtection based on common neighborhood (PROCODE) that considers the inherent organization of protein complexes as well as the regions with heavy interactions in PPI networks to detect protein complexes. Initially, the core of the protein complexes is detected based on the neighborhood of PPI network. Then a merging strategy based on density is used to attach proteins and protein complexes to the core-protein complexes to form biologically meaningful structures. The predicted protein complexes of PROCODE was evaluated and analyzed using four PPI network datasets out of which three were from budding yeast and one from human. Our proposed technique is compared with some of the existing techniques using standard benchmark complexes and PROCODE was found to match very well with actual protein complexes in the benchmark data. The detected complexes were at par with existing biological evidence and knowledge.
Collapse
|
18
|
Walvekar AS, Srinivasan R, Gupta R, Laxman S. Methionine coordinates a hierarchically organized anabolic program enabling proliferation. Mol Biol Cell 2018; 29:3183-3200. [PMID: 30354837 PMCID: PMC6340205 DOI: 10.1091/mbc.e18-08-0515] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 10/12/2018] [Accepted: 10/19/2018] [Indexed: 12/21/2022] Open
Abstract
Methionine availability during overall amino acid limitation metabolically reprograms cells to support proliferation, the underlying basis for which remains unclear. Here we construct the organization of this methionine-mediated anabolic program using yeast. Combining comparative transcriptome analysis and biochemical and metabolic flux-based approaches, we discover that methionine rewires overall metabolic outputs by increasing the activity of a key regulatory node. This comprises the pentose phosphate pathway (PPP) coupled with reductive biosynthesis, the glutamate dehydrogenase (GDH)-dependent synthesis of glutamate/glutamine, and pyridoxal-5-phosphate (PLP)-dependent transamination capacity. This PPP-GDH-PLP node provides the required cofactors and/or substrates for subsequent rate-limiting reactions in the synthesis of amino acids and therefore nucleotides. These rate-limiting steps in amino acid biosynthesis are also induced in a methionine-dependent manner. This thereby results in a biochemical cascade establishing a hierarchically organized anabolic program. For this methionine-mediated anabolic program to be sustained, cells co-opt a "starvation stress response" regulator, Gcn4p. Collectively, our data suggest a hierarchical metabolic framework explaining how methionine mediates an anabolic switch.
Collapse
Affiliation(s)
- Adhish S. Walvekar
- Institute for Stem Cell biology and Regenerative Medicine (inStem), NCBS-TIFR campus, Bangalore 560065, India
| | - Rajalakshmi Srinivasan
- Institute for Stem Cell biology and Regenerative Medicine (inStem), NCBS-TIFR campus, Bangalore 560065, India
| | - Ritu Gupta
- Institute for Stem Cell biology and Regenerative Medicine (inStem), NCBS-TIFR campus, Bangalore 560065, India
| | - Sunil Laxman
- Institute for Stem Cell biology and Regenerative Medicine (inStem), NCBS-TIFR campus, Bangalore 560065, India
| |
Collapse
|
19
|
Ray SS, Misra S. Genetic algorithm for assigning weights to gene expressions using functional annotations. Comput Biol Med 2018; 104:149-162. [PMID: 30472497 DOI: 10.1016/j.compbiomed.2018.11.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 11/13/2018] [Accepted: 11/13/2018] [Indexed: 12/17/2022]
Abstract
A method, named genetic algorithm for assigning weights to gene expressions using functional annotations (GAAWGEFA), is developed to assign proper weights to the gene expressions at each time point. The weights are estimated using functional annotations of the genes in a genetic algorithm framework. The method shows gene similarity in an improved manner as compared with other existing methods because it takes advantage of the existing functional annotations of the genes. The weight combination for the expressions at different time points is determined by maximizing the fitness function of GAAWGEFA in terms of the positive predictive value (PPV) for the top 10,000 gene pairs. The performance of the proposed method is primarily compared with Biweight mid correlation (BICOR) and original expression values for the six Saccharomyces cerevisiae datasets and one Bacillus subtilis dataset. The utility of GAAWGEFA is shown in predicting the functions of 48 unclassified genes (using p-value cutoff 10-13) from Saccharomyces cerevisiae microarray data where the expressions are weighted using GAAWGEFA and are clustered using k-medoids algorithm. The related code along with various parameters is available at http://sampa.droppages.com/GAAWGEFA.html.
Collapse
Affiliation(s)
- Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India.
| | - Sampa Misra
- Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India.
| |
Collapse
|
20
|
Liu X, Yang Z, Sang S, Zhou Z, Wang L, Zhang Y, Lin H, Wang J, Xu B. Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks. BMC Bioinformatics 2018; 19:332. [PMID: 30241459 PMCID: PMC6150962 DOI: 10.1186/s12859-018-2364-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 09/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Protein complexes are one of the keys to deciphering the behavior of a cell system. During the past decade, most computational approaches used to identify protein complexes have been based on discovering densely connected subgraphs in protein-protein interaction (PPI) networks. However, many true complexes are not dense subgraphs and these approaches show limited performances for detecting protein complexes from PPI networks. RESULTS To solve these problems, in this paper we propose a supervised learning method based on network node embeddings which utilizes the informative properties of known complexes to guide the search process for new protein complexes. First, node embeddings are obtained from human protein interaction network. Then the protein interactions are weighted through the similarities between node embeddings. After that, the supervised learning method is used to detect protein complexes. Then the random forest model is used to filter the candidate complexes in order to obtain the final predicted complexes. Experimental results on real human and yeast protein interaction networks show that our method effectively improves the performance for protein complex detection. CONCLUSIONS We provided a new method for identifying protein complexes from human and yeast protein interaction networks, which has great potential to benefit the field of protein complex detection.
Collapse
Affiliation(s)
- Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China.
| | - Shengtian Sang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Ziwei Zhou
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Lei Wang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850, People's Republic of China.
| | - Yin Zhang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850, People's Republic of China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Bo Xu
- School of Software Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| |
Collapse
|
21
|
Wang R, Liu G, Wang C, Su L, Sun L. Predicting overlapping protein complexes based on core-attachment and a local modularity structure. BMC Bioinformatics 2018; 19:305. [PMID: 30134824 PMCID: PMC6106838 DOI: 10.1186/s12859-018-2309-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 07/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent decades, detecting protein complexes (PCs) from protein-protein interaction networks (PPINs) has been an active area of research. There are a large number of excellent graph clustering methods that work very well for identifying PCs. However, most of existing methods usually overlook the inherent core-attachment organization of PCs. Therefore, these methods have three major limitations we should concern. Firstly, many methods have ignored the importance of selecting seed, especially without considering the impact of overlapping nodes as seed nodes. Thus, there may be false predictions. Secondly, PCs are generally supposed to be dense subgraphs. However, the subgraphs with high local modularity structure usually correspond to PCs. Thirdly, a number of available methods lack handling noise mechanism, and miss some peripheral proteins. In summary, all these challenging issues are very important for predicting more biological overlapping PCs. RESULTS In this paper, to overcome these weaknesses, we propose a clustering method by core-attachment and local modularity structure, named CALM, to detect overlapping PCs from weighted PPINs with noises. Firstly, we identify overlapping nodes and seed nodes. Secondly, for a node, we calculate the support function between a node and a cluster. In CALM, a cluster which initially consists of only a seed node, is extended by adding its direct neighboring nodes recursively according to the support function, until this cluster forms a locally optimal modularity subgraph. Thirdly, we repeat this process for the remaining seed nodes. Finally, merging and removing procedures are carried out to obtain final predicted clusters. The experimental results show that CALM outperforms other classical methods, and achieves ideal overall performance. Furthermore, CALM can match more complexes with a higher accuracy and provide a better one-to-one mapping with reference complexes in all test datasets. Additionally, CALM is robust against the high rate of noise PPIN. CONCLUSIONS By considering core-attachment and local modularity structure, CALM could detect PCs much more effectively than some representative methods. In short, CALM could potentially identify previous undiscovered overlapping PCs with various density and high modularity.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037 China
| | - Lingtao Su
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| | - Liyan Sun
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012 China
| |
Collapse
|
22
|
Liu W, Ma L, Jeon B, Chen L, Chen B. A Network Hierarchy-Based method for functional module detection in protein-protein interaction networks. J Theor Biol 2018; 455:26-38. [PMID: 29981337 DOI: 10.1016/j.jtbi.2018.06.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 06/27/2018] [Accepted: 06/29/2018] [Indexed: 02/02/2023]
Abstract
In the post-genomic era, one of the important tasks is to identify protein complexes and functional modules from high-throughput protein-protein interaction data, so that we can systematically analyze and understand the molecular functions and biological processes of cells. Although a lot of functional module detection studies have been proposed, how to design correctly and efficiently functional modules detection algorithms is still a challenging and important scientific problem in computational biology. In this paper, we present a novel Network Hierarchy-Based method to detect functional modules in PPI networks (named NHB-FMD). NHB-FMD first constructs the hierarchy tree corresponding to the PPI network and then encodes the tree such that genetic algorithm is employed to obtain the hierarchy tree with Maximum Likelihood. After that functional module partitioning is performed based on it and the best partitioning is selected as the result. Experimental results in the real PPI networks have shown that the proposed algorithm not only significantly outperforms the state-of-the-art methods but also can detect protein modules more effectively and accurately.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China; The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China; School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea.
| | - Liangyu Ma
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Byeungwoo Jeon
- School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea
| | - Ling Chen
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Bolun Chen
- The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China
| |
Collapse
|
23
|
Heit C, Martin S, Yang F, Inglis D. Osmoadaptation of wine yeast (Saccharomyces cerevisiae
) during Icewine fermentation leads to high levels of acetic acid. J Appl Microbiol 2018; 124:1506-1520. [DOI: 10.1111/jam.13733] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 01/19/2018] [Accepted: 02/05/2018] [Indexed: 11/28/2022]
Affiliation(s)
- C. Heit
- Cool Climate Oenology and Viticulture Institute; Brock University; St. Catharines ON Canada
- Centre for Biotechnology; Brock University; St. Catharines ON Canada
| | - S.J. Martin
- Cool Climate Oenology and Viticulture Institute; Brock University; St. Catharines ON Canada
- Centre for Biotechnology; Brock University; St. Catharines ON Canada
- Department of Biological Sciences; Brock University; St. Catharines ON Canada
| | - F. Yang
- Cool Climate Oenology and Viticulture Institute; Brock University; St. Catharines ON Canada
| | - D.L. Inglis
- Cool Climate Oenology and Viticulture Institute; Brock University; St. Catharines ON Canada
- Centre for Biotechnology; Brock University; St. Catharines ON Canada
- Department of Biological Sciences; Brock University; St. Catharines ON Canada
| |
Collapse
|
24
|
Cao B, Deng S, Luo J, Ding P, Wang S. Identification of overlapping protein complexes by fuzzy K-medoids clustering algorithm in yeast protein-protein interaction networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2018. [DOI: 10.3233/jifs-17026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Buwen Cao
- School of Information Science and Engineering, Hunan City University, Yiyang, China
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shuguang Deng
- College of Communication and Electronic Engineering, Hunan City University, Yiyang, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
25
|
Finding optimum width of discretization for gene expressions using functional annotations. Comput Biol Med 2017; 90:59-67. [DOI: 10.1016/j.compbiomed.2017.09.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 09/14/2017] [Accepted: 09/14/2017] [Indexed: 12/20/2022]
|
26
|
Boross G, Papp B. No Evidence That Protein Noise-Induced Epigenetic Epistasis Constrains Gene Expression Evolution. Mol Biol Evol 2017; 34:380-390. [PMID: 28025271 DOI: 10.1093/molbev/msw236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Changes in gene expression can affect phenotypes and therefore both its level and stochastic variability are frequently under selection. It has recently been proposed that epistatic interactions influence gene expression evolution: gene pairs where simultaneous knockout is more deleterious than expected should evolve reduced expression noise to avoid concurrent low expression of both proteins. In apparent support, yeast genes with many epistatic partners have low expression variation both among isogenic individuals and between species. However, the specific predictions and basic assumptions of this verbal model remain untested. Using bioinformatics analysis, we first demonstrate that the model's predictions are unsupported by available large-scale data. Based on quantitative biochemical modeling, we then show that epistasis between expression reductions (epigenetic epistasis) is not expected to aggravate the fitness cost of stochastic expression, which is in sharp contrast to the verbal argument. This nonintuitive result can be readily explained by the typical diminishing return of fitness on gene activity and by the fact that expression noise not only decreases but also increases the abundance of proteins. Overall, we conclude that stochastic variation in epistatic partners is unlikely to drive noise minimization or constrain gene expression divergence on a genomic scale.
Collapse
Affiliation(s)
- Gábor Boross
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Szeged, Hungary
| | - Balázs Papp
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Szeged, Hungary
| |
Collapse
|
27
|
Yang C, Ji J, Zhang A. BFO-FMD: bacterial foraging optimization for functional module detection in protein–protein interaction networks. Soft comput 2017. [DOI: 10.1007/s00500-017-2584-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
28
|
Oguz C, Watson LT, Baumann WT, Tyson JJ. Predicting network modules of cell cycle regulators using relative protein abundance statistics. BMC SYSTEMS BIOLOGY 2017; 11:30. [PMID: 28241833 PMCID: PMC5329933 DOI: 10.1186/s12918-017-0409-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 02/17/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. RESULTS Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. CONCLUSIONS By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.
Collapse
Affiliation(s)
- Cihan Oguz
- Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061, USA.
| | - Layne T Watson
- Department of Computer Science, Virginia Tech, Blacksburg VA, 24061, USA.,Department of Mathematics, Virginia Tech, Blacksburg VA, 24061, USA.,Department of Aerospace and Ocean Engineering, Virginia Tech, Blacksburg VA, 24061, USA
| | - William T Baumann
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg VA, 24061, USA
| | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg VA, 24061, USA
| |
Collapse
|
29
|
Ray SS, Misra S. A supervised weighted similarity measure for gene expressions using biological knowledge. Gene 2016; 595:150-160. [PMID: 27688070 DOI: 10.1016/j.gene.2016.09.033] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Revised: 08/18/2016] [Accepted: 09/22/2016] [Indexed: 11/17/2022]
Abstract
A supervised similarity measure for Saccharomyces cerevisiae gene expressions is developed which can capture the gene similarity when multiple types of experimental conditions like cell cycle, heat shock are available for all the genes. The measure is called Weighted Pearson correlation (WPC), where the weights are systematically determined for each type of experiment by maximizing the positive predictive value for gene pairs having Pearson correlation greater than 0.80. The positive predictive value is computed by using the annotation information available from yeast GO-Slim process annotations in Saccharomyces Genome Database (SGD). Genes are then clustered by k-medoid algorithm using the newly computed WPC, and functions of 135 unclassified genes are predicted with a p-value cutoff 10-5 using Munich Information for Protein Sequences (MIPS) annotations. Out of these genes, functional categories of 55 gene are predicted with p-value cutoff greater than 10-10 and reported in this investigation. The superiority of WPC as compared to some existing similarity measures like Pearson correlation and Euclidean distance is demonstrated using positive predictive (PPV) values of gene pairs for different Saccharomyces cerevisiae data sets. The related code is available at http://www.sampa.droppages.com/WPC.html.
Collapse
Affiliation(s)
- Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| | - Sampa Misra
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
| |
Collapse
|
30
|
Keretsu S, Sarmah R. Weighted edge based clustering to identify protein complexes in protein-protein interaction networks incorporating gene expression profile. Comput Biol Chem 2016; 65:69-79. [PMID: 27771556 DOI: 10.1016/j.compbiolchem.2016.10.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2015] [Revised: 09/19/2016] [Accepted: 10/03/2016] [Indexed: 12/31/2022]
Abstract
Protein complex detection from protein-protein interaction (PPI) network has received a lot of focus in recent years. A number of methods identify protein complexes as dense sub-graphs using network information while several other methods detect protein complexes based on topological information. While the methods based on identifying dense sub-graphs are more effective in identifying protein complexes, not all protein complexes have high density. Moreover, existing methods focus more on static PPI networks and usually overlook the dynamic nature of protein complexes. Here, we propose a new method, Weighted Edge based Clustering (WEC), to identify protein complexes based on the weight of the edge between two interacting proteins, where the weight is defined by the edge clustering coefficient and the gene expression correlation between the interacting proteins. Our WEC method is capable of detecting highly inter-connected and co-expressed protein complexes. The experimental results of WEC on three real life data shows that our method can detect protein complexes effectively in comparison with other highly cited existing methods. AVAILABILITY The WEC tool is available at http://agnigarh.tezu.ernet.in/~rosy8/shared.html.
Collapse
Affiliation(s)
- Seketoulie Keretsu
- North Eastern Regional Institute of Science and Technology, Arunachal Pradesh, India.
| | - Rosy Sarmah
- Tezpur University, Assam, India. http://agnigarh.tezu.ernet.in/~rosy8/index.html
| |
Collapse
|
31
|
Cao B, Luo J, Liang C, Wang S, Ding P. PCE-FR: A Novel Method for Identifying Overlapping Protein Complexes in Weighted Protein-Protein Interaction Networks Using Pseudo-Clique Extension Based on Fuzzy Relation. IEEE Trans Nanobioscience 2016; 15:728-738. [PMID: 27662678 DOI: 10.1109/tnb.2016.2611683] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Identifying overlapping protein complexes in protein-protein interaction (PPI) networks can provide insight into cellular functional organization and thus elucidate underlying cellular mechanisms. Recently, various algorithms for protein complexes detection have been developed for PPI networks. However, majority of algorithms primarily depend on network topological feature and/or gene expression profile, failing to consider the inherent biological meanings between protein pairs. In this paper, we propose a novel method to detect protein complexes using pseudo-clique extension based on fuzzy relation (PCE-FR). Our algorithm operates in three stages: it first forms the nonoverlapping protein substructure based on fuzzy relation and then expands each substructure by adding neighbor proteins to maximize the cohesive score. Finally, highly overlapped candidate protein complexes are merged to form the final protein complex set. Particularly, our algorithm employs the biological significance hidden in protein pairs to construct edge weight for protein interaction networks. The experiment results show that our method can not only outperform classical algorithms such as CFinder, ClusterONE, CMC, RRW, HC-PIN, and ProRank +, but also achieve ideal overall performance in most of the yeast PPI datasets in terms of composite score consisting of precision, accuracy, and separation. We further apply our method to a human PPI network from the HPRD dataset and demonstrate it is very effective in detecting protein complexes compared to other algorithms.
Collapse
|
32
|
Luo J, Lin D, Cao B. A cell-core-attachment approach for identifying protein complexes in yeast protein-protein interaction network. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2016. [DOI: 10.3233/jifs-169026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
33
|
Ji J, Lv J, Yang C, Zhang A. Detecting Functional Modules Based on a Multiple-Grain Model in Large-Scale Protein-Protein Interaction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:610-622. [PMID: 26394434 DOI: 10.1109/tcbb.2015.2480066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Detecting functional modules from a Protein-Protein Interaction (PPI) network is a fundamental and hot issue in proteomics research, where many computational approaches have played an important role in recent years. However, how to effectively and efficiently detect functional modules in large-scale PPI networks is still a challenging problem. We present a new framework, based on a multiple-grain model of PPI networks, to detect functional modules in PPI networks. First, we give a multiple-grain representation model of a PPI network, which has a smaller scale with super nodes. Next, we design the protein grain partitioning method, which employs a functional similarity or a structural similarity to merge some proteins layer by layer. Thirdly, a refining mechanism with border node tests is proposed to address the protein overlapping of different modules during the grain eliminating process. Finally, systematic experiments are conducted on five large-scale yeast and human networks. The results show that the framework not only significantly reduces the running time of functional module detection, but also effectively identifies overlapping modules while keeping some competitive performances, thus it is highly competent to detect functional modules in large-scale PPI networks.
Collapse
|
34
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
35
|
Peters TW, Miller AW, Tourette C, Agren H, Hubbard A, Hughes RE. Genomic Analysis of ATP Efflux in Saccharomyces cerevisiae. G3 (BETHESDA, MD.) 2015; 6:161-70. [PMID: 26585826 PMCID: PMC4704715 DOI: 10.1534/g3.115.023267] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 11/06/2015] [Indexed: 01/12/2023]
Abstract
Adenosine triphosphate (ATP) plays an important role as a primary molecule for the transfer of chemical energy to drive biological processes. ATP also functions as an extracellular signaling molecule in a diverse array of eukaryotic taxa in a conserved process known as purinergic signaling. Given the important roles of extracellular ATP in cell signaling, we sought to comprehensively elucidate the pathways and mechanisms governing ATP efflux from eukaryotic cells. Here, we present results of a genomic analysis of ATP efflux from Saccharomyces cerevisiae by measuring extracellular ATP levels in cultures of 4609 deletion mutants. This screen revealed key cellular processes that regulate extracellular ATP levels, including mitochondrial translation and vesicle sorting in the late endosome, indicating that ATP production and transport through vesicles are required for efflux. We also observed evidence for altered ATP efflux in strains deleted for genes involved in amino acid signaling, and mitochondrial retrograde signaling. Based on these results, we propose a model in which the retrograde signaling pathway potentiates amino acid signaling to promote mitochondrial respiration. This study advances our understanding of the mechanism of ATP secretion in eukaryotes and implicates TOR complex 1 (TORC1) and nutrient signaling pathways in the regulation of ATP efflux. These results will facilitate analysis of ATP efflux mechanisms in higher eukaryotes.
Collapse
Affiliation(s)
| | - Aaron W Miller
- The Buck Institute for Research on Aging, Novato, California 94945
| | | | - Hannah Agren
- The Buck Institute for Research on Aging, Novato, California 94945
| | - Alan Hubbard
- School of Public Health, Division of Biostatistics, University of California, Berkeley, California 94729-7358
| | - Robert E Hughes
- The Buck Institute for Research on Aging, Novato, California 94945
| |
Collapse
|
36
|
Teoh ST, Putri S, Mukai Y, Bamba T, Fukusaki E. A metabolomics-based strategy for identification of gene targets for phenotype improvement and its application to 1-butanol tolerance in Saccharomyces cerevisiae. BIOTECHNOLOGY FOR BIOFUELS 2015; 8:144. [PMID: 26379776 PMCID: PMC4570087 DOI: 10.1186/s13068-015-0330-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 08/28/2015] [Indexed: 05/23/2023]
Abstract
BACKGROUND Traditional approaches to phenotype improvement include rational selection of genes for modification, and probability-driven processes such as laboratory evolution or random mutagenesis. A promising middle-ground approach is semi-rational engineering, where genetic modification targets are inferred from system-wide comparison of strains. Here, we have applied a metabolomics-based, semi-rational strategy of phenotype improvement to 1-butanol tolerance in Saccharomyces cerevisiae. RESULTS Nineteen yeast single-deletion mutant strains with varying growth rates under 1-butanol stress were subjected to non-targeted metabolome analysis by GC/MS, and a regression model was constructed using metabolite peak intensities as predictors and stress growth rates as the response. From this model, metabolites positively and negatively correlated with growth rate were identified including threonine and citric acid. Based on the assumption that these metabolites were linked to 1-butanol tolerance, new deletion strains accumulating higher threonine or lower citric acid were selected and subjected to tolerance measurement and metabolome analysis. The new strains exhibiting the predicted changes in metabolite levels also displayed significantly higher growth rate under stress over the control strain, thus validating the link between these metabolites and 1-butanol tolerance. CONCLUSIONS A strategy for semi-rational phenotype improvement using metabolomics was proposed and applied to the 1-butanol tolerance of S. cerevisiae. Metabolites correlated with growth rate under 1-butanol stress were identified, and new mutant strains showing higher growth rate under stress could be selected based on these metabolites. The results demonstrate the potential of metabolomics in semi-rational strain engineering.
Collapse
Affiliation(s)
- Shao Thing Teoh
- />Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, 565-0871 Japan
| | - Sastia Putri
- />Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, 565-0871 Japan
| | - Yukio Mukai
- />Department of Bioscience, Nagahama Institute of Bio-Science and Technology, 1266 Tamura, Nagahama, Shiga 526-0829 Japan
| | - Takeshi Bamba
- />Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, 565-0871 Japan
| | - Eiichiro Fukusaki
- />Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, 565-0871 Japan
| |
Collapse
|
37
|
Wang Y, Feng L, Zhu Y, Li Y, Yan H, Xiang Y. Comparative genomic analysis of the WRKY III gene family in populus, grape, arabidopsis and rice. Biol Direct 2015; 10:48. [PMID: 26350041 PMCID: PMC4563840 DOI: 10.1186/s13062-015-0076-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 08/17/2015] [Indexed: 01/22/2023] Open
Abstract
Background WRKY III genes have significant functions in regulating plant development and resistance. In plant, WRKY gene family has been studied in many species, however, there still lack a comprehensive analysis of WRKY III genes in the woody plant species poplar, three representative lineages of flowering plant species are incorporated in most analyses: Arabidopsis (a model plant for annual herbaceous dicots), grape (one model plant for perennial dicots) and Oryza sativa (a model plant for monocots). Results In this study, we identified 10, 6, 13 and 28 WRKY III genes in the genomes of Populus trichocarpa, grape (Vitis vinifera), Arabidopsis thaliana and rice (Oryza sativa), respectively. Phylogenetic analysis revealed that the WRKY III proteins could be divided into four clades. By microsynteny analysis, we found that the duplicated regions were more conserved between poplar and grape than Arabidopsis or rice. We dated their duplications by Ks analysis of Populus WRKY III genes and demonstrated that all the blocks were formed after the divergence of monocots and dicots. Strong purifying selection has played a key role in the maintenance of WRKY III genes in Populus. Tissue expression analysis of the WRKY III genes in Populus revealed that five were most highly expressed in the xylem. We also performed quantitative real-time reverse transcription PCR analysis of WRKY III genes in Populus treated with salicylic acid, abscisic acid and polyethylene glycol to explore their stress-related expression patterns. Conclusions This study highlighted the duplication and diversification of the WRKY III gene family in Populus and provided a comprehensive analysis of this gene family in the Populus genome. Our results indicated that the majority of WRKY III genes of Populus was expanded by large-scale gene duplication. The expression pattern of PtrWRKYIII gene identified that these genes play important roles in the xylem during poplar growth and development, and may play crucial role in defense to drought stress. Our results presented here may aid in the selection of appropriate candidate genes for further characterization of their biological functions in poplar. Reviewers This article was reviewed by Prof Dandekar and Dr Andrade-Navarro. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0076-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yiyi Wang
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Lin Feng
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yuxin Zhu
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yuan Li
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Hanwei Yan
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yan Xiang
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China. .,Key Laboratory of Crop Biology of Anhui Agriculture University, Hefei, 230036, China.
| |
Collapse
|
38
|
Wang Y, Feng L, Zhu Y, Li Y, Yan H, Xiang Y. Comparative genomic analysis of the WRKY III gene family in populus, grape, arabidopsis and rice. Biol Direct 2015. [PMID: 26350041 DOI: 10.1186/s13062-015-0076-73] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2023] Open
Abstract
BACKGROUND WRKY III genes have significant functions in regulating plant development and resistance. In plant, WRKY gene family has been studied in many species, however, there still lack a comprehensive analysis of WRKY III genes in the woody plant species poplar, three representative lineages of flowering plant species are incorporated in most analyses: Arabidopsis (a model plant for annual herbaceous dicots), grape (one model plant for perennial dicots) and Oryza sativa (a model plant for monocots). RESULTS In this study, we identified 10, 6, 13 and 28 WRKY III genes in the genomes of Populus trichocarpa, grape (Vitis vinifera), Arabidopsis thaliana and rice (Oryza sativa), respectively. Phylogenetic analysis revealed that the WRKY III proteins could be divided into four clades. By microsynteny analysis, we found that the duplicated regions were more conserved between poplar and grape than Arabidopsis or rice. We dated their duplications by Ks analysis of Populus WRKY III genes and demonstrated that all the blocks were formed after the divergence of monocots and dicots. Strong purifying selection has played a key role in the maintenance of WRKY III genes in Populus. Tissue expression analysis of the WRKY III genes in Populus revealed that five were most highly expressed in the xylem. We also performed quantitative real-time reverse transcription PCR analysis of WRKY III genes in Populus treated with salicylic acid, abscisic acid and polyethylene glycol to explore their stress-related expression patterns. CONCLUSIONS This study highlighted the duplication and diversification of the WRKY III gene family in Populus and provided a comprehensive analysis of this gene family in the Populus genome. Our results indicated that the majority of WRKY III genes of Populus was expanded by large-scale gene duplication. The expression pattern of PtrWRKYIII gene identified that these genes play important roles in the xylem during poplar growth and development, and may play crucial role in defense to drought stress. Our results presented here may aid in the selection of appropriate candidate genes for further characterization of their biological functions in poplar.
Collapse
Affiliation(s)
- Yiyi Wang
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Lin Feng
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yuxin Zhu
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yuan Li
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Hanwei Yan
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yan Xiang
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
- Key Laboratory of Crop Biology of Anhui Agriculture University, Hefei, 230036, China.
| |
Collapse
|
39
|
Wang Y, Feng L, Zhu Y, Li Y, Yan H, Xiang Y. Comparative genomic analysis of the WRKY III gene family in populus, grape, arabidopsis and rice. Biol Direct 2015. [PMID: 26350041 DOI: 10.1186/s13062-015-007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023] Open
Abstract
BACKGROUND WRKY III genes have significant functions in regulating plant development and resistance. In plant, WRKY gene family has been studied in many species, however, there still lack a comprehensive analysis of WRKY III genes in the woody plant species poplar, three representative lineages of flowering plant species are incorporated in most analyses: Arabidopsis (a model plant for annual herbaceous dicots), grape (one model plant for perennial dicots) and Oryza sativa (a model plant for monocots). RESULTS In this study, we identified 10, 6, 13 and 28 WRKY III genes in the genomes of Populus trichocarpa, grape (Vitis vinifera), Arabidopsis thaliana and rice (Oryza sativa), respectively. Phylogenetic analysis revealed that the WRKY III proteins could be divided into four clades. By microsynteny analysis, we found that the duplicated regions were more conserved between poplar and grape than Arabidopsis or rice. We dated their duplications by Ks analysis of Populus WRKY III genes and demonstrated that all the blocks were formed after the divergence of monocots and dicots. Strong purifying selection has played a key role in the maintenance of WRKY III genes in Populus. Tissue expression analysis of the WRKY III genes in Populus revealed that five were most highly expressed in the xylem. We also performed quantitative real-time reverse transcription PCR analysis of WRKY III genes in Populus treated with salicylic acid, abscisic acid and polyethylene glycol to explore their stress-related expression patterns. CONCLUSIONS This study highlighted the duplication and diversification of the WRKY III gene family in Populus and provided a comprehensive analysis of this gene family in the Populus genome. Our results indicated that the majority of WRKY III genes of Populus was expanded by large-scale gene duplication. The expression pattern of PtrWRKYIII gene identified that these genes play important roles in the xylem during poplar growth and development, and may play crucial role in defense to drought stress. Our results presented here may aid in the selection of appropriate candidate genes for further characterization of their biological functions in poplar.
Collapse
Affiliation(s)
- Yiyi Wang
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Lin Feng
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yuxin Zhu
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yuan Li
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Hanwei Yan
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
| | - Yan Xiang
- Laboratory of Modern Biotechnology, School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, 230036, China.
- Key Laboratory of Crop Biology of Anhui Agriculture University, Hefei, 230036, China.
| |
Collapse
|
40
|
Yu F, Yang Z, Hu X, Sun Y, Lin H, Wang J. Protein complex detection in PPI networks based on data integration and supervised learning method. BMC Bioinformatics 2015; 16 Suppl 12:S3. [PMID: 26329886 PMCID: PMC4705505 DOI: 10.1186/1471-2105-16-s12-s3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Revealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection. Methods The new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks. Results The experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively. Conclusions The experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.
Collapse
|
41
|
Cao B, Luo J, Liang C, Wang S, Song D. MOEPGA: A novel method to detect protein complexes in yeast protein-protein interaction networks based on MultiObjective Evolutionary Programming Genetic Algorithm. Comput Biol Chem 2015; 58:173-81. [PMID: 26298638 DOI: 10.1016/j.compbiolchem.2015.06.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 06/02/2015] [Accepted: 06/22/2015] [Indexed: 02/02/2023]
Abstract
The identification of protein complexes in protein-protein interaction (PPI) networks has greatly advanced our understanding of biological organisms. Existing computational methods to detect protein complexes are usually based on specific network topological properties of PPI networks. However, due to the inherent complexity of the network structures, the identification of protein complexes may not be fully addressed by using single network topological property. In this study, we propose a novel MultiObjective Evolutionary Programming Genetic Algorithm (MOEPGA) which integrates multiple network topological features to detect biologically meaningful protein complexes. Our approach first systematically analyzes the multiobjective problem in terms of identifying protein complexes from PPI networks, and then constructs the objective function of the iterative algorithm based on three common topological properties of protein complexes from the benchmark dataset, finally we describe our algorithm, which mainly consists of three steps, population initialization, subgraph mutation and subgraph selection operation. To show the utility of our method, we compared MOEPGA with several state-of-the-art algorithms on two yeast PPI datasets. The experiment results demonstrate that the proposed method can not only find more protein complexes but also achieve higher accuracy in terms of fscore. Moreover, our approach can cover a certain number of proteins in the input PPI network in terms of the normalized clustering score. Taken together, our method can serve as a powerful framework to detect protein complexes in yeast PPI networks, thereby facilitating the identification of the underlying biological functions.
Collapse
Affiliation(s)
- Buwen Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province, China.
| | - Cheng Liang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province, China
| | - Shulin Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province, China
| | - Dan Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China; Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province, China
| |
Collapse
|
42
|
Yu F, Yang Z, Tang N, Lin H, Wang J, Yang Z. Predicting protein complex in protein interaction network - a supervised learning based method. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 3:S4. [PMID: 25349902 PMCID: PMC4243764 DOI: 10.1186/1752-0509-8-s3-s4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Background Protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, making it possible to predict protein complexes from protein -protein interaction networks. However, most of current methods are unsupervised learning based methods which can't utilize the information of the large amount of available known complexes. Methods We present a supervised learning-based method for predicting protein complexes in protein - protein interaction networks. The method extracts rich features from both the unweighted and weighted networks to train a Regression model, which is then used for the cliques filtering, growth, and candidate complex filtering. The model utilizes additional "uncertainty" samples and, therefore, is more discriminative when used in the complex detection algorithm. In addition, our method uses the maximal cliques found by the Cliques algorithm as the initial cliques, which has been proven to be more effective than the method of expanding from the seeding proteins used in other methods. Results The experimental results on several PIN datasets show that in most cases the performance of our method are superior to comparable state-of-the-art protein complex detection techniques. Conclusions The results demonstrate the several advantages of our method over other state-of-the-art techniques. Firstly, our method is a supervised learning-based method that can make full use of the information of the available known complexes instead of being only based on the topological structure of the PIN. That also means, if more training samples are provided, our method can achieve better performance than those unsupervised methods. Secondly, we design the rich feature set to describe the properties of the known complexes, which includes not only the features from the unweighted network, but also those from the weighted network built based on the Gene Ontology information. Thirdly, our Regression model utilizes additional "uncertainty" samples and, therefore, becomes more discriminative, whose effectiveness for the complex detection is indicated by our experimental results.
Collapse
|
43
|
Yang ZH, Yu FY, Lin HF, Wang J. Integrating PPI datasets with the PPI data from biomedical literature for protein complex detection. BMC Med Genomics 2014; 7 Suppl 2:S3. [PMID: 25350598 PMCID: PMC4243118 DOI: 10.1186/1755-8794-7-s2-s3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein-protein interactions (PPIs), making it possible to predict protein complexes from protein-protein interaction networks. On the other hand, the rapidly growing biomedical literature provides a significantly large and readily available source of interaction data, which can be integrated into the protein network for better complex detection performance. METHODS We present an approach of integrating PPI datasets with the PPI data from biomedical literature for protein complex detection. The approach applies a sophisticated natural language processing system, PPIExtractor, to extract PPI data from biomedical literature. These data are then integrated into the PPI datasets for complex detection. RESULTS The experimental results of the state-of-the-art complex detection method, ClusterONE, on five yeast PPI datasets verify our method's effectiveness: compared with the original PPI datasets, the average improvements of 3.976 and 5.416 percentage units in the maximum matching ratio (MMR) are achieved on the new networks using the MIPS and SGD gold standards, respectively. In addition, our approach also proves to be effective for three other complex detection algorithms proposed in recent years, i.e. CMC, COACH and RRW. CONCLUSIONS The rapidly growing biomedical literature provides a significantly large, readily available and relatively accurate source of interaction data, which can be integrated into the protein network for better protein complex detection performance.
Collapse
Affiliation(s)
- Zhi Hao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Feng Ying Yu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Hong Fei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| |
Collapse
|
44
|
Barozai MYK, Bashir F, Muzaffar S, Afzal S, Behlil F, Khan M. In-silico identification and characterization of organic and inorganic chemical stress responding genes in yeast (Saccharomyces cerevisiae). Gene 2014; 550:74-80. [PMID: 25111117 DOI: 10.1016/j.gene.2014.08.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Revised: 05/31/2014] [Accepted: 08/08/2014] [Indexed: 10/24/2022]
Abstract
To study the life processes of all eukaryotes, yeast (Saccharomyces cerevisiae) is a significant model organism. It is also one of the best models to study the responses of genes at transcriptional level. In a living organism, gene expression is changed by chemical stresses. The genes that give response to chemical stresses will provide good source for the strategies in engineering and formulating mechanisms which are chemical stress resistant in the eukaryotic organisms. The data available through microarray under the chemical stresses like lithium chloride, lactic acid, weak organic acids and tomatidine were studied by using computational tools. Out of 9335 yeast genes, 388 chemical stress responding genes were identified and characterized under different chemical stresses. Some of these are: Enolases 1 and 2, heat shock protein-82, Yeast Elongation Factor 3, Beta Glucanase Protein, Histone H2A1 and Histone H2A2 Proteins, Benign Prostatic Hyperplasia, ras GTPase activating protein, Establishes Silent Chromatin protein, Mei5 Protein, Nondisjunction Protein and Specific Mitogen Activated Protein Kinase. Characterization of these genes was also made on the basis of their molecular functions, biological processes and cellular components.
Collapse
Affiliation(s)
| | - Farrukh Bashir
- Department of Chemistry, Sardar Bahadur Khan Women's University, Quetta, Pakistan
| | - Shafia Muzaffar
- Department of Chemistry, Sardar Bahadur Khan Women's University, Quetta, Pakistan
| | - Saba Afzal
- Department of Chemistry, Sardar Bahadur Khan Women's University, Quetta, Pakistan
| | - Farida Behlil
- Department of Chemistry, Sardar Bahadur Khan Women's University, Quetta, Pakistan
| | - Muzaffar Khan
- Department of Chemistry, Sardar Bahadur Khan Women's University, Quetta, Pakistan
| |
Collapse
|
45
|
Abstract
MOTIVATION Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. RESULTS Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. AVAILABILITY Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.
Collapse
Affiliation(s)
- Nagarajan Natarajan
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Inderjit S Dhillon
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
46
|
Ji JZ, Jiao L, Yang CC, Lv JW, Zhang AD. MAE-FMD: multi-agent evolutionary method for functional module detection in protein-protein interaction networks. BMC Bioinformatics 2014; 15:325. [PMID: 25265982 PMCID: PMC4262229 DOI: 10.1186/1471-2105-15-325] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 09/22/2014] [Indexed: 11/24/2022] Open
Abstract
Background Studies of functional modules in a Protein-Protein Interaction (PPI) network contribute greatly to the understanding of biological mechanisms. With the development of computing science, computational approaches have played an important role in detecting functional modules. Results We present a new approach using multi-agent evolution for detection of functional modules in PPI networks. The proposed approach consists of two stages: the solution construction for agents in a population and the evolutionary process of computational agents in a lattice environment, where each agent corresponds to a candidate solution to the detection problem of functional modules in a PPI network. First, the approach utilizes a connection-based encoding scheme to model an agent, and employs a random-walk behavior merged topological characteristics with functional information to construct a solution. Next, it applies several evolutionary operators, i.e., competition, crossover, and mutation, to realize information exchange among agents as well as solution evolution. Systematic experiments have been conducted on three benchmark testing sets of yeast networks. Experimental results show that the approach is more effective compared to several other existing algorithms. Conclusions The algorithm has the characteristics of outstanding recall, F-measure, sensitivity and accuracy while keeping other competitive performances, so it can be applied to the biological study which requires high accuracy.
Collapse
Affiliation(s)
- Jun Zhong Ji
- College of Computer Science, Beijing University of Technology, Chaoyang District, Beijing, China.
| | | | | | | | | |
Collapse
|
47
|
Exploring function prediction in protein interaction networks via clustering methods. PLoS One 2014; 9:e99755. [PMID: 24972109 PMCID: PMC4074043 DOI: 10.1371/journal.pone.0099755] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 05/17/2014] [Indexed: 01/06/2023] Open
Abstract
Complex networks have recently become the focus of research in many fields. Their structure reveals crucial information for the nodes, how they connect and share information. In our work we analyze protein interaction networks as complex networks for their functional modular structure and later use that information in the functional annotation of proteins within the network. We propose several graph representations for the protein interaction network, each having different level of complexity and inclusion of the annotation information within the graph. We aim to explore what the benefits and the drawbacks of these proposed graphs are, when they are used in the function prediction process via clustering methods. For making this cluster based prediction, we adopt well established approaches for cluster detection in complex networks using most recent representative algorithms that have been proven as efficient in the task at hand. The experiments are performed using a purified and reliable Saccharomyces cerevisiae protein interaction network, which is then used to generate the different graph representations. Each of the graph representations is later analysed in combination with each of the clustering algorithms, which have been possibly modified and implemented to fit the specific graph. We evaluate results in regards of biological validity and function prediction performance. Our results indicate that the novel ways of presenting the complex graph improve the prediction process, although the computational complexity should be taken into account when deciding on a particular approach.
Collapse
|
48
|
Gotoh O, Morita M, Nelson DR. Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment. BMC Bioinformatics 2014; 15:189. [PMID: 24927652 PMCID: PMC4065584 DOI: 10.1186/1471-2105-15-189] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 06/09/2014] [Indexed: 03/29/2024] Open
Abstract
Background Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. Results We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 ~ 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i.e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. Conclusions Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants.
Collapse
Affiliation(s)
- Osamu Gotoh
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
49
|
A replication study for genome-wide gene expression levels in two layer lines elucidates differentially expressed genes of pathways involved in bone remodeling and immune responsiveness. PLoS One 2014; 9:e98350. [PMID: 24922511 PMCID: PMC4055560 DOI: 10.1371/journal.pone.0098350] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 05/01/2014] [Indexed: 11/19/2022] Open
Abstract
The current replication study confirmed significant differences in gene expression profiles of the cerebrum among the two commercial layer lines Lohmann Selected Leghorn (LSL) and Lohmann Brown (LB). Microarray analyses were performed for 30 LSL and another 30 LB laying hens kept in the small group housing system Eurovent German. A total of 14,103 microarray probe sets using customized Affymetrix ChiGene-1_0-st Arrays with 20,399 probe sets were differentially expressed among the two layer lines LSL and LB (FDR adjusted P-value <0.05). An at least 2-fold change in expression levels could be observed for 388 of these probe sets. In LSL, 214 of the 388 probe sets were down- and 174 were up-regulated and vice versa for the LB layer line. Among the 174 up-regulated probe sets in LSL, we identified 51 significantly enriched Gene ontology (GO) terms of the biological process category. A total of 63 enriched GO-terms could be identified for the 214 down-regulated probe sets of the layer line LSL. We identified nine genes significantly differentially expressed between the two layer lines in both microarray experiments. These genes play a crucial role in protection of neuronal cells from oxidative stress, bone mineral density and immune response among the two layer lines LSL and LB. Thus, the different regulation of these genes may significantly contribute to phenotypic trait differences among these layer lines. In conclusion, these novel findings provide a basis for further research to improve animal welfare in laying hens and these layer lines may be of general interest as an animal model.
Collapse
|
50
|
Lopes FM, Ray SS, Hashimoto RF, Cesar RM. Entropic Biological Score: a cell cycle investigation for GRNs inference. Gene 2014; 541:129-37. [DOI: 10.1016/j.gene.2014.03.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 02/17/2014] [Accepted: 03/05/2014] [Indexed: 12/21/2022]
|