1
|
Ferrarini MG, Vallier A, Vincent-Monégat C, Dell'Aglio E, Gillet B, Hughes S, Hurtado O, Condemine G, Zaidman-Rémy A, Rebollo R, Parisot N, Heddi A. Coordination of host and endosymbiont gene expression governs endosymbiont growth and elimination in the cereal weevil Sitophilus spp. MICROBIOME 2023; 11:274. [PMID: 38087390 PMCID: PMC10717185 DOI: 10.1186/s40168-023-01714-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 10/30/2023] [Indexed: 12/18/2023]
Abstract
BACKGROUND Insects living in nutritionally poor environments often establish long-term relationships with intracellular bacteria that supplement their diets and improve their adaptive and invasive powers. Even though these symbiotic associations have been extensively studied on physiological, ecological, and evolutionary levels, few studies have focused on the molecular dialogue between host and endosymbionts to identify genes and pathways involved in endosymbiosis control and dynamics throughout host development. RESULTS We simultaneously analyzed host and endosymbiont gene expression during the life cycle of the cereal weevil Sitophilus oryzae, from larval stages to adults, with a particular emphasis on emerging adults where the endosymbiont Sodalis pierantonius experiences a contrasted growth-climax-elimination dynamics. We unraveled a constant arms race in which different biological functions are intertwined and coregulated across both partners. These include immunity, metabolism, metal control, apoptosis, and bacterial stress response. CONCLUSIONS The study of these tightly regulated functions, which are at the center of symbiotic regulations, provides evidence on how hosts and bacteria finely tune their gene expression and respond to different physiological challenges constrained by insect development in a nutritionally limited ecological niche. Video Abstract.
Collapse
Affiliation(s)
- Mariana Galvão Ferrarini
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621, Villeurbanne, France
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622, Villeurbanne, France
| | - Agnès Vallier
- Univ Lyon, INRAE, INSA Lyon, BF2I, UMR 203, 69621, Villeurbanne, France
| | | | - Elisa Dell'Aglio
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621, Villeurbanne, France
| | - Benjamin Gillet
- Institut de Génomique Fonctionnelle de Lyon (IGFL), CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| | - Sandrine Hughes
- Institut de Génomique Fonctionnelle de Lyon (IGFL), CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université de Lyon, Lyon, France
| | - Ophélie Hurtado
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621, Villeurbanne, France
| | - Guy Condemine
- Univ Lyon, Université Lyon 1, INSA de Lyon, CNRS UMR 5240 Microbiologie Adaptation et Pathogénie, Villeurbanne, France
| | - Anna Zaidman-Rémy
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621, Villeurbanne, France
- Institut universitaire de France (IUF), Paris, France
| | - Rita Rebollo
- Univ Lyon, INRAE, INSA Lyon, BF2I, UMR 203, 69621, Villeurbanne, France
| | - Nicolas Parisot
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621, Villeurbanne, France.
| | - Abdelaziz Heddi
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621, Villeurbanne, France.
| |
Collapse
|
2
|
Siddique S, Radakovic ZS, Hiltl C, Pellegrin C, Baum TJ, Beasley H, Bent AF, Chitambo O, Chopra D, Danchin EGJ, Grenier E, Habash SS, Hasan MS, Helder J, Hewezi T, Holbein J, Holterman M, Janakowski S, Koutsovoulos GD, Kranse OP, Lozano-Torres JL, Maier TR, Masonbrink RE, Mendy B, Riemer E, Sobczak M, Sonawala U, Sterken MG, Thorpe P, van Steenbrugge JJM, Zahid N, Grundler F, Eves-van den Akker S. The genome and lifestage-specific transcriptomes of a plant-parasitic nematode and its host reveal susceptibility genes involved in trans-kingdom synthesis of vitamin B5. Nat Commun 2022; 13:6190. [PMID: 36261416 PMCID: PMC9582021 DOI: 10.1038/s41467-022-33769-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 09/30/2022] [Indexed: 12/24/2022] Open
Abstract
Plant-parasitic nematodes are a major threat to crop production in all agricultural systems. The scarcity of classical resistance genes highlights a pressing need to find new ways to develop nematode-resistant germplasm. Here, we sequence and assemble a high-quality phased genome of the model cyst nematode Heterodera schachtii to provide a platform for the first system-wide dual analysis of host and parasite gene expression over time, covering all major parasitism stages. Analysis of the hologenome of the plant-nematode infection site identified metabolic pathways that were incomplete in the parasite but complemented by the host. Using a combination of bioinformatic, genetic, and biochemical approaches, we show that a highly atypical completion of vitamin B5 biosynthesis by the parasitic animal, putatively enabled by a horizontal gene transfer from a bacterium, is required for full pathogenicity. Knockout of either plant-encoded or now nematode-encoded steps in the pathway significantly reduces parasitic success. Our experiments establish a reference for cyst nematodes, further our understanding of the evolution of plant-parasitism by nematodes, and show that congruent differential expression of metabolic pathways in the infection hologenome represents a new way to find nematode susceptibility genes. The approach identifies genome-editing-amenable targets for future development of nematode-resistant crops.
Collapse
Affiliation(s)
- Shahid Siddique
- Department of Entomology and Nematology, University of California Davis, One Shields Avenue, Davis, CA, 95616, USA
| | - Zoran S Radakovic
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
- P.H. Petersen Saatzucht Lundsgaard GmbH, D-24977, Grundhof, Germany
| | - Clarissa Hiltl
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Clement Pellegrin
- The Crop Science Centre, Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Thomas J Baum
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Helen Beasley
- The Crop Science Centre, Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Andrew F Bent
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Oliver Chitambo
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Divykriti Chopra
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Etienne G J Danchin
- Université Côte d'Azur, INRAE, CNRS, Institut Sophia Agrobiotech, Sophia-Antipolis, France
| | - Eric Grenier
- IGEPP, INRAE, Institut Agro, Université Rennes, 35650, Le Rheu, France
| | - Samer S Habash
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
- BASF Vegetable Seeds, Napoleonsweg 152, 6083, AB, Nunhem, The Netherlands
| | - M Shamim Hasan
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Johannes Helder
- Laboratory of Nematology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands
| | - Tarek Hewezi
- Department of Plant Sciences, University of Tennessee, Knoxville, TN, 37996, USA
| | - Julia Holbein
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Martijn Holterman
- Laboratory of Nematology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands
- Solynta, Dreijenlaan 2, 6703, HA, Wageningen, The Netherlands
| | - Sławomir Janakowski
- Department of Botany, Institute of Biology, Warsaw University of Life Sciences (SGGW), Nowoursynowska 159, 02-787, Warsaw, Poland
| | | | - Olaf P Kranse
- The Crop Science Centre, Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Jose L Lozano-Torres
- Laboratory of Nematology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands
| | - Tom R Maier
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Rick E Masonbrink
- Genome Informatics Facility, Iowa State University, Ames, IA, 50010, USA
| | - Badou Mendy
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Esther Riemer
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany
| | - Mirosław Sobczak
- Department of Botany, Institute of Biology, Warsaw University of Life Sciences (SGGW), Nowoursynowska 159, 02-787, Warsaw, Poland
| | - Unnati Sonawala
- The Crop Science Centre, Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Mark G Sterken
- Laboratory of Nematology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands
| | - Peter Thorpe
- Mackenzie Institute for Early Diagnosis, School of Medicine, University of St Andrews, North Haugh, St Andrews, KY16 9TF, UK
| | - Joris J M van Steenbrugge
- Laboratory of Nematology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, The Netherlands
| | - Nageena Zahid
- Institute for Microbiology and Biotechnology, Rheinische Friedrich-Wilhelms-University of Bonn, Meckenheimer Allee 168, D-53115, Bonn, Germany
| | - Florian Grundler
- Rheinische Friedrich-Wilhelms-University of Bonn, INRES - Molecular Phytomedicine, Karlrobert- Kreiten-Straße 13, D-53115, Bonn, Germany.
| | | |
Collapse
|
3
|
A Pattern New in Every Moment: The Temporal Clustering of Markets for Crude Oil, Refined Fuels, and Other Commodities. ENERGIES 2021. [DOI: 10.3390/en14196099] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The identification of critical periods and business cycles contributes significantly to the analysis of financial markets and the macroeconomy. Financialization and cointegration place a premium on the accurate recognition of time-varying volatility in commodity markets, especially those for crude oil and refined fuels. This article seeks to identify critical periods in the trading of energy-related commodities as a step toward understanding the temporal dynamics of those markets. This article proposes a novel application of unsupervised machine learning. A suite of clustering methods, applied to conditional volatility forecasts by trading days and individual assets or asset classes, can identify critical periods in energy-related commodity markets. Unsupervised machine learning achieves this task without rules-based or subjective definitions of crises. Five clustering methods—affinity propagation, mean-shift, spectral, k-means, and hierarchical agglomerative clustering—can identify anomalous periods in commodities trading. These methods identified the financial crisis of 2008–2009 and the initial stages of the COVID-19 pandemic. Applied to four energy-related markets—Brent, West Texas intermediate, gasoil, and gasoline—the same methods identified additional periods connected to events such as the September 11 terrorist attacks and the 2003 Persian Gulf war. t-distributed stochastic neighbor embedding facilitates the visualization of trading regimes. Temporal clustering of conditional volatility forecasts reveals unusual financial properties that distinguish the trading of energy-related commodities during critical periods from trading during normal periods and from trade in other commodities in all periods. Whereas critical periods for all commodities appear to coincide with broader disruptions in demand for energy, critical periods unique to crude oil and refined fuels appear to arise from acute disruptions in supply. Extensions of these methods include the definition of bull and bear markets and the identification of recessions and recoveries in the real economy.
Collapse
|
4
|
Chen JM, Zovko M, Šimurina N, Zovko V. Fear in a Handful of Dust: The Epidemiological, Environmental, and Economic Drivers of Death by PM 2.5 Pollution. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:8688. [PMID: 34444435 PMCID: PMC8393768 DOI: 10.3390/ijerph18168688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/03/2021] [Accepted: 08/14/2021] [Indexed: 01/13/2023]
Abstract
This study evaluates numerous epidemiological, environmental, and economic factors affecting morbidity and mortality from PM2.5 exposure in the 27 member states of the European Union. This form of air pollution inflicts considerable social and economic damage in addition to loss of life and well-being. This study creates and deploys a comprehensive data pipeline. The first step consists of conventional linear models and supervised machine learning alternatives. Those regression methods do more than predict health outcomes in the EU-27 and relate those predictions to independent variables. Linear regression and its machine learning equivalents also inform unsupervised machine learning methods such as clustering and manifold learning. Lower-dimension manifolds of this dataset's feature space reveal the relationship among EU-27 countries and their success (or failure) in managing PM2.5 morbidity and mortality. Principal component analysis informs further interpretation of variables along economic and health-based lines. A nonlinear environmental Kuznets curve may describe the fuller relationship between economic activity and premature death from PM2.5 exposure. The European Union should bridge the historical, cultural, and economic gaps that impair these countries' collective response to PM2.5 pollution.
Collapse
Affiliation(s)
- James Ming Chen
- College of Law, Michigan State University, East Lansing, MI 48824, USA
| | - Mira Zovko
- Ministry of Economy and Sustainable Development, 10000 Zagreb, Croatia;
| | - Nika Šimurina
- Faculty of Economics & Business, University of Zagreb, 10000 Zagreb, Croatia;
| | - Vatroslav Zovko
- Faculty of Teacher Education, University of Zagreb, 10000 Zagreb, Croatia;
| |
Collapse
|
5
|
Alvarez JM, Brooks MD, Swift J, Coruzzi GM. Time-Based Systems Biology Approaches to Capture and Model Dynamic Gene Regulatory Networks. ANNUAL REVIEW OF PLANT BIOLOGY 2021; 72:105-131. [PMID: 33667112 PMCID: PMC9312366 DOI: 10.1146/annurev-arplant-081320-090914] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
All aspects of transcription and its regulation involve dynamic events. However, capturing these dynamic events in gene regulatory networks (GRNs) offers both a promise and a challenge. The promise is that capturing and modeling the dynamic changes in GRNs will allow us to understand how organisms adapt to a changing environment. The ability to mount a rapid transcriptional response to environmental changes is especially important in nonmotile organisms such as plants. The challenge is to capture these dynamic, genome-wide events and model them in GRNs. In this review, we cover recent progress in capturing dynamic interactions of transcription factors with their targets-at both the local and genome-wide levels-and how they are used to learn how GRNs operate as a function of time. We also discuss recent advances that employ time-based machine learning approaches to forecast gene expression at future time points, a key goal of systems biology.
Collapse
Affiliation(s)
- Jose M Alvarez
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- ANID-Millennium Science Initiative Program-Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| | - Matthew D Brooks
- Global Change and Photosynthesis Research Unit, US Department of Agriculture Agricultural Research Service, Urbana, Illinois 61801, USA
| | - Joseph Swift
- Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | - Gloria M Coruzzi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA;
| |
Collapse
|
6
|
Abstract
Background:
The evolutionary history of organisms can be described by phylogenetic
trees. We need to compare the topologies of rooted phylogenetic trees when researching the
evolution of a given set of species.
Objective:
Up to now, there are several metrics measuring the dissimilarity between rooted
phylogenetic trees, and those metrics are defined by different ways.
Methods:
This paper analyzes those metrics from their definitions and the distance values
computed by those metrics by terms of experiments.
Results:
The results of experiments show that the distances calculated by the cluster metric, the
partition metric, and the equivalent metric have a good Gaussian fitting, and the equivalent metric
can describe the difference between trees better than the others.
Conclusion:
Moreover, it presents a tool called as CDRPT (Computing Distance for Rooted
Phylogenetic Trees). CDRPT is a web server to calculate the distance for trees by an on-line way.
CDRPT can also be off-line used by means of installing application packages for the Windows
system. It greatly facilitates the use of researchers. The home page of CDRPT is
http://bioinformatics.imu.edu.cn/tree/.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Xinyue Qi
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Bo Cui
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
7
|
Li J, Chang M, Gao Q, Song X, Gao Z. Lung Cancer Classification and Gene Selection by Combining Affinity Propagation Clustering and Sparse Group Lasso. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017103557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Background:
Cancer threatens human health seriously. Diagnosing cancer via gene expression
analysis is a hot topic in cancer research.
Objective:
The study aimed to diagnose the accurate type of lung cancer and discover the pathogenic
genes.
Methods:
In this study, Affinity Propagation (AP) clustering with similarity score was employed
to each type of lung cancer and normal lung. After grouping genes, sparse group lasso was adopted
to construct four binary classifiers and the voting strategy was used to integrate them.
Results:
This study screened six gene groups that may associate with different lung cancer subtypes
among 73 genes groups, and identified three possible key pathogenic genes, KRAS, BRAF
and VDR. Furthermore, this study achieved improved classification accuracies at minority classes
SQ and COID in comparison with other four methods.
Conclusion:
We propose the AP clustering based sparse group lasso (AP-SGL), which provides
an alternative for simultaneous diagnosis and gene selection for lung cancer.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Mingming Chang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Qinghui Gao
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China
| | - Xuekun Song
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| | - Zhiyu Gao
- School of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China
| |
Collapse
|
8
|
Clink DJ, Klinck H. Unsupervised acoustic classification of individual gibbon females and the implications for passive acoustic monitoring. Methods Ecol Evol 2020. [DOI: 10.1111/2041-210x.13520] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Dena J. Clink
- Center for Conservation Bioacoustics Cornell Laboratory of Ornithology Cornell University Ithaca NY USA
| | - Holger Klinck
- Center for Conservation Bioacoustics Cornell Laboratory of Ornithology Cornell University Ithaca NY USA
| |
Collapse
|
9
|
Zhou Y, Zhang Y, Liang T, Wang L. Shifting of phytoplankton assemblages in a regulated Chinese river basin after streamflow and water quality changes. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 654:948-959. [PMID: 30841412 DOI: 10.1016/j.scitotenv.2018.10.348] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/07/2018] [Accepted: 10/26/2018] [Indexed: 06/09/2023]
Abstract
Phytoplankton is critical to river ecosystems. These organisms are sensitive to streamflow and water quality changes and, therefore, used to determine stability of river ecosystems, especially in regulated rivers. However, exactly how such disturbances alter spatial distribution of phytoplankton remains unclear, particularly during different seasons. A thorough understanding of these mechanisms is required to better analyze impact of environmental factors on regulated rivers. Given this, phytoplankton communities, streamflow, and water quality factors were assessed in areas sampled four times from 2015 to 2016 in upper and middle Huai River Basin. Biodiversity indices, as well as cluster and rank analyses, were used to (1) determine phytoplankton composition and distribution and (2) clarify impacts of key streamflow and water quality factors on such communities. It was found phytoplankton composition deteriorated over time, with phyla number decreasing from six to three. Moreover, proportion of Bacillariophyta increased from 51.83% to 68.13%. Phytoplankton in three regions, upstream region (Shannon-Wiener index 1.39-2.95), midstream region (0.70-4.55), and downstream region (0.22 to 2.97), were spatially clustered. The most impact factors impacting variation in composition and distribution were water quality factors and then hydrological factors. Of these, the most important factors in wet seasons were total nitrogen and maximum runoff, while ammonia nitrogen and low flow discharge were the most important factors during dry seasons. Streamflow and water quality contributed the most in midstream region, which was significantly affected by numbers of high and low flow. Contributions of these factors to downstream region were the strongest during dry seasons, which were significantly affected by numbers of low flow. Collectively, these results reveal significant impact of streamflow and water quality factors on phytoplankton deterioration in upper and middle Huai River Basin. Critically, this study provides scientific and technological support for increased biomonitoring and ecohydrological studies in regulated river basins.
Collapse
Affiliation(s)
- Yujian Zhou
- Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongyong Zhang
- Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China.
| | - Tao Liang
- University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Land Surface and Simulation, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
| | - Lingqing Wang
- Key Laboratory of Land Surface and Simulation, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
10
|
Brusco MJ, Steinley D, Stevens J, Cradit JD. Affinity propagation: An exemplar-based tool for clustering in psychological research. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2019; 72:155-182. [PMID: 29633235 DOI: 10.1111/bmsp.12136] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 01/26/2018] [Indexed: 06/08/2023]
Abstract
Affinity propagation is a message-passing-based clustering procedure that has received widespread attention in domains such as biological science, physics, and computer science. However, its implementation in psychology and related areas of social science is comparatively scant. In this paper, we describe the basic principles of affinity propagation, its relationship to other clustering problems, and the types of data for which it can be used for cluster analysis. More importantly, we identify the strengths and weaknesses of affinity propagation as a clustering tool in general and highlight potential opportunities for its use in psychological research. Numerical examples are provided to illustrate the method.
Collapse
Affiliation(s)
- Michael J Brusco
- Department of Business Analytics, Information Systems, and Supply Chain, Florida State University, Tallahassee, Florida, USA
| | - Douglas Steinley
- Department of Psychological Sciences, University of Missouri, Columbia, Missouri, USA
| | - Jordan Stevens
- Department of Psychological Sciences, University of Missouri, Columbia, Missouri, USA
| | - J Dennis Cradit
- Department of Business Analytics, Information Systems, and Supply Chain, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
11
|
Dickinson E, Rusilowicz MJ, Dickinson M, Charlton AJ, Bechtold U, Mullineaux PM, Wilson J. Integrating transcriptomic techniques and k-means clustering in metabolomics to identify markers of abiotic and biotic stress in Medicago truncatula. Metabolomics 2018; 14:126. [PMID: 30830458 PMCID: PMC6153691 DOI: 10.1007/s11306-018-1424-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 09/03/2018] [Indexed: 11/06/2022]
Abstract
INTRODUCTION Nitrogen-fixing legumes are invaluable crops, but are sensitive to physical and biological stresses. Whilst drought and infection from the soil-borne pathogen Fusarium oxysporum have been studied individually, their combined effects have not been widely investigated. OBJECTIVES We aimed to determine the effect of combined stress using methods usually associated with transcriptomics to detect metabolic differences between treatment groups that could not be identified by more traditional means, such as principal component analysis and partial least squares discriminant analysis. METHODS Liquid chromatography-high resolution mass spectrometry data from the root and leaves of model legume Medicago truncatula were analysed using Gaussian Process 2-Sample Test, k-means cluster analysis and temporal clustering by affinity propagation. RESULTS Metabolic differences were detected: we identified known stress markers, including changes in concentration for sucrose and citric acid, and showed that combined stress can exacerbate the effect of drought. Changes in roots were found to be smaller than those in leaves, but differences due to Fusarium infection were identified. The transfer of sucrose from leaves to roots can be seen in the time series using transcriptomic techniques with the metabolomics time series. Other metabolite concentrations that change as a result of treatment include phosphoric acid, malic acid and tetrahydroxychalcone. CONCLUSIONS Probing metabolomic data with transcriptomic tools provides new insights and could help to identify resilient plant varieties, thereby increasing future crop yield and improving food security.
Collapse
Affiliation(s)
| | | | | | | | - Ulrike Bechtold
- School of Biological Sciences, University of Essex, Colchester, CO4 3SQ, UK
| | | | - Julie Wilson
- Department of Mathematics, University of York, York, YO1 5DD, UK
| |
Collapse
|
12
|
Polanski K, Gao B, Mason SA, Brown P, Ott S, Denby KJ, Wild DL. Bringing numerous methods for expression and promoter analysis to a public cloud computing service. Bioinformatics 2018; 34:884-886. [PMID: 29126246 PMCID: PMC6030968 DOI: 10.1093/bioinformatics/btx692] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 11/03/2017] [Indexed: 12/24/2022] Open
Abstract
Summary Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. Availability and implementation The integrated apps can be found at http://www.cyverse.org/discovery-environment, while the raw code is available at https://github.com/cyversewarwick and the corresponding Docker images are housed at https://hub.docker.com/r/cyversewarwick/. Contact info@cyverse.warwick.ac.uk or D.L.Wild@warwick.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Paul Brown
- Department of Mathematics
- Systems Biology Centre
| | - Sascha Ott
- Systems Biology Centre
- Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK
| | | | | |
Collapse
|
13
|
Fischer S, Freuling CM, Müller T, Pfaff F, Bodenhofer U, Höper D, Fischer M, Marston DA, Fooks AR, Mettenleiter TC, Conraths FJ, Homeier-Bachmann T. Defining objective clusters for rabies virus sequences using affinity propagation clustering. PLoS Negl Trop Dis 2018; 12:e0006182. [PMID: 29357361 PMCID: PMC5794188 DOI: 10.1371/journal.pntd.0006182] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 02/01/2018] [Accepted: 12/19/2017] [Indexed: 11/18/2022] Open
Abstract
Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. Rabies is one of the oldest known zoonoses, caused by lyssaviruses. In recent years, more than 21,000 nucleotide sequences for rabies viruses (RABV) have been deposited in public databases. In this study, a novel mathematical approach called affinity propagation (AP) clustering, a highly powerful tool, to verifiably divide full genome RABV sequences into genetic clusters, was used. A panel of existing and novel RABV full genome sequences was used to demonstrate the application of AP for RABV clustering. Using a combination of AP with established phylogenetic analyses is useful in resolving phylogenetic relationships between more objectively determined clusters and sequences. This workflow will help to substantiate a transparent cluster distribution, not only for RABV, but also for other comparative sequence analyses.
Collapse
Affiliation(s)
- Susanne Fischer
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Epidemiology, Greifswald-Insel Riems, Germany
| | - Conrad M. Freuling
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Molecular Virology and Cell Biology, OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Rabies Surveillance and Research, Greifswald-Insel Riems, Germany
| | - Thomas Müller
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Molecular Virology and Cell Biology, OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Rabies Surveillance and Research, Greifswald-Insel Riems, Germany
- * E-mail:
| | - Florian Pfaff
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Diagnostic Virology, Greifswald-Insel Riems, Germany
| | - Ulrich Bodenhofer
- Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria
| | - Dirk Höper
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Molecular Virology and Cell Biology, OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Rabies Surveillance and Research, Greifswald-Insel Riems, Germany
| | - Mareike Fischer
- Institute of Mathematics and Computer Science, University Greifswald, Greifswald, Germany
| | - Denise A. Marston
- Wildlife Zoonoses and Vector-Borne Diseases Research Group, Animal and Plant Health Agency (APHA), OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Characterization of Lyssaviruses, Weybridge, United Kingdom
| | - Anthony R. Fooks
- Wildlife Zoonoses and Vector-Borne Diseases Research Group, Animal and Plant Health Agency (APHA), OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Characterization of Lyssaviruses, Weybridge, United Kingdom
| | - Thomas C. Mettenleiter
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Molecular Virology and Cell Biology, OIE Reference Laboratory for Rabies, WHO Collaborating Centre for Rabies Surveillance and Research, Greifswald-Insel Riems, Germany
| | - Franz J. Conraths
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Epidemiology, Greifswald-Insel Riems, Germany
| | - Timo Homeier-Bachmann
- Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Institute of Epidemiology, Greifswald-Insel Riems, Germany
| |
Collapse
|
14
|
Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F. Intelligent mining of large-scale bio-data: Bioinformatics applications. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1364977] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- Farahnaz Sadat Golestan Hashemi
- Plant Genetics, AgroBioChem Department, Gembloux Agro-Bio Tech, University of Liege, Liege, Belgium
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Razi Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Rafii Yusop
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mahboobe Sadat Golestan Hashemi
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Mohammad Hossein Nadimi Shahraki
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Hamid Rastegari
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
| | - Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farzad Aslani
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|
15
|
Chen QS, Wang D, Liu BL, Gao SF, Gao DL, Li GR. Combining affinity propagation clustering and mutual information network to investigate key genes in fibroid. Exp Ther Med 2017; 14:251-259. [PMID: 28672922 PMCID: PMC5488419 DOI: 10.3892/etm.2017.4481] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 02/01/2017] [Indexed: 01/21/2023] Open
Abstract
The aim of the present study was to investigate key genes in fibroids based on the multiple affinity propogation-Krzanowski and Lai (mAP-KL) method, which included the maxT multiple hypothesis, Krzanowski and Lai (KL) cluster quality index, affinity propagation (AP) clustering algorithm and mutual information network (MIN) constructed by the context likelihood of relatedness (CLR) algorithm. In order to achieve this goal, mAP-KL was initially implemented to investigate exemplars in fibroid, and the maxT function was employed to rank the genes of training and test sets, and the top 200 genes were obtained for further study. In addition, the KL cluster index was applied to determine the quantity of clusters and the AP clustering algorithm was conducted to identify the clusters and their exemplars. Subsequently, the support vector machine (SVM) model was selected to evaluate the classification performance of mAP-KL. Finally, topological properties (degree, closeness, betweenness and transitivity) of exemplars in MIN constructed according to the CLR algorithm were assessed to investigate key genes in fibroid. The SVM model validated that the classification between normal controls and fibroid patients by mAP-KL had a good performance. A total of 9 clusters and exemplars were identified based on mAP-KL, which were comprised of CALCOCO2, COL4A2, COPS8, SNCG, PA2G4, C17orf70, MARK3, BTNL3 and TBC1D13. By accessing the topological analysis for exemplars in MIN, SNCG and COL4A2 were identified as the two most significant genes of four types of methods, and they were denoted as key genes in the progress of fibroid. In conclusion, two key genes (SNCG and COL4A2) and 9 exemplars were successfully investigated, and these may be potential biomarkers for the detection and treatment of fibroid.
Collapse
Affiliation(s)
- Qian-Song Chen
- Department of Gynaecology, Tangshan Maternal and Child Healthcare Hospital, Tangshan, Hebei 063000, P.R. China
| | - Dan Wang
- Department of Gynaecology, Tangshan Maternal and Child Healthcare Hospital, Tangshan, Hebei 063000, P.R. China
| | - Bao-Lian Liu
- Department of Reproductive Genetics, Tangshan Maternal and Child Healthcare Hospital, Tangshan, Hebei 063000, P.R. China
| | - Shu-Feng Gao
- Department of Gynaecology, Tangshan Maternal and Child Healthcare Hospital, Tangshan, Hebei 063000, P.R. China
| | - Dan-Li Gao
- Department of Gynaecology, Tangshan Maternal and Child Healthcare Hospital, Tangshan, Hebei 063000, P.R. China
| | - Gui-Rong Li
- Department of Gynaecology, Tangshan Maternal and Child Healthcare Hospital, Tangshan, Hebei 063000, P.R. China
| |
Collapse
|
16
|
Cao H, Amendt BA. pySAPC, a python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data. Biochim Biophys Acta Gen Subj 2016; 1860:2613-8. [PMID: 27288587 DOI: 10.1016/j.bbagen.2016.06.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Revised: 06/03/2016] [Accepted: 06/05/2016] [Indexed: 11/18/2022]
Abstract
BACKGROUND Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis). METHODS A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis. RESULTS pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis. CONCLUSIONS Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects. GENERAL SIGNIFICANCE By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Collapse
Affiliation(s)
- Huojun Cao
- Iowa Institute for Oral Health Research, College of Dentistry, The University of Iowa, Iowa City, IA 52244, USA
| | - Brad A Amendt
- Iowa Institute for Oral Health Research, College of Dentistry, The University of Iowa, Iowa City, IA 52244, USA; Department of Anatomy and Cell Biology and Craniofacial Anomalies Research Center, Carver College of Medicine, The University of Iowa, Iowa City, IA 52244, USA.
| |
Collapse
|
17
|
|
18
|
Ye N, Yin H, Liu J, Dai X, Yin T. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature. BIOMED RESEARCH INTERNATIONAL 2015; 2015:853734. [PMID: 26199946 PMCID: PMC4496643 DOI: 10.1155/2015/853734] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Revised: 05/20/2015] [Accepted: 06/11/2015] [Indexed: 12/21/2022]
Abstract
The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures.
Collapse
Affiliation(s)
- Ning Ye
- The Southern Modern Forestry Collaborative Innovation Center, Nanjing Forestry University, Nanjing 210037, China
- College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China
| | - Hengfu Yin
- Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Fuyang, Zhejiang 311400, China
- Key Laboratory of Forest genetics and breeding, Chinese Academy of Forestry, Fuyang, Zhejiang 311400, China
| | - Jingjing Liu
- The Southern Modern Forestry Collaborative Innovation Center, Nanjing Forestry University, Nanjing 210037, China
- College of Forest Resources and Environment, Nanjing Forestry University, Nanjing 210037, China
| | - Xiaogang Dai
- The Southern Modern Forestry Collaborative Innovation Center, Nanjing Forestry University, Nanjing 210037, China
- College of Forest Resources and Environment, Nanjing Forestry University, Nanjing 210037, China
| | - Tongming Yin
- The Southern Modern Forestry Collaborative Innovation Center, Nanjing Forestry University, Nanjing 210037, China
- College of Forest Resources and Environment, Nanjing Forestry University, Nanjing 210037, China
| |
Collapse
|
19
|
Penfold CA, Buchanan-Wollaston V. Modelling transcriptional networks in leaf senescence. JOURNAL OF EXPERIMENTAL BOTANY 2014; 65:3859-73. [PMID: 24600015 DOI: 10.1093/jxb/eru054] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The process of leaf senescence is induced by an extensive range of developmental and environmental signals and controlled by multiple, cross-linking pathways, many of which overlap with plant stress-response signals. Elucidation of this complex regulation requires a step beyond a traditional one-gene-at-a-time analysis. Application of a more global analysis using statistical and mathematical tools of systems biology is an approach that is being applied to address this problem. A variety of modelling methods applicable to the analysis of current and future senescence data are reviewed and discussed using some senescence-specific examples. Network modelling with a senescence transcriptome time course followed by testing predictions with gene-expression data illustrates the application of systems biology tools.
Collapse
Affiliation(s)
| | - Vicky Buchanan-Wollaston
- Warwick Systems Biology Centre, University of Warwick, Coventry CV4 7AL, UK School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
20
|
Wang M, Zhang W, Ding W, Dai D, Zhang H, Xie H, Chen L, Guo Y, Xie J. Parallel clustering algorithm for large-scale biological data sets. PLoS One 2014; 9:e91315. [PMID: 24705246 PMCID: PMC3976248 DOI: 10.1371/journal.pone.0091315] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 02/10/2014] [Indexed: 02/06/2023] Open
Abstract
BACKGROUNDS Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. METHODS Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. RESULT A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
Collapse
Affiliation(s)
- Minchao Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Wu Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- High Performance Computing Center, Shanghai University, Shanghai, P.R.China
| | - Wang Ding
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Dongbo Dai
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Huiran Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Hao Xie
- College of Stomatology, Wuhan University, Wuhan, P.R.China
| | - Luonan Chen
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P.R.China
| | - Yike Guo
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- Department of Computing, Imperial College London, London, United Kingdom
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| |
Collapse
|
21
|
Cui Y, Zheng CH, Yang J. Identifying subspace gene clusters from microarray data using low-rank representation. PLoS One 2013; 8:e59377. [PMID: 23527177 PMCID: PMC3602020 DOI: 10.1371/journal.pone.0059377] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 02/13/2013] [Indexed: 12/23/2022] Open
Abstract
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.
Collapse
Affiliation(s)
- Yan Cui
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Chun-Hou Zheng
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui, China
| | - Jian Yang
- School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- * E-mail:
| |
Collapse
|
22
|
Jenkins DJ, Finkenstädt B, Rand DA. A temporal switch model for estimating transcriptional activity in gene expression. ACTA ACUST UNITED AC 2013; 29:1158-65. [PMID: 23479351 PMCID: PMC3634189 DOI: 10.1093/bioinformatics/btt111] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Motivation: The analysis and mechanistic modelling of time series gene expression data provided by techniques such as microarrays, NanoString, reverse transcription–polymerase chain reaction and advanced sequencing are invaluable for developing an understanding of the variation in key biological processes. We address this by proposing the estimation of a flexible dynamic model, which decouples temporal synthesis and degradation of mRNA and, hence, allows for transcriptional activity to switch between different states. Results: The model is flexible enough to capture a variety of observed transcriptional dynamics, including oscillatory behaviour, in a way that is compatible with the demands imposed by the quality, time-resolution and quantity of the data. We show that the timing and number of switch events in transcriptional activity can be estimated alongside individual gene mRNA stability with the help of a Bayesian reversible jump Markov chain Monte Carlo algorithm. To demonstrate the methodology, we focus on modelling the wild-type behaviour of a selection of 200 circadian genes of the model plant Arabidopsis thaliana. The results support the idea that using a mechanistic model to identify transcriptional switch points is likely to strongly contribute to efforts in elucidating and understanding key biological processes, such as transcription and degradation. Contact:B.F.Finkenstadt@Warwick.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dafyd J Jenkins
- Warwick Systems Biology Centre, University of Warwick, Coventry CV4 7AL, UK
| | | | | |
Collapse
|
23
|
Zampetaki A, Willeit P, Tilling L, Drozdov I, Prokopi M, Renard JM, Mayr A, Weger S, Schett G, Shah A, Boulanger CM, Willeit J, Chowienczyk PJ, Kiechl S, Mayr M. Prospective study on circulating MicroRNAs and risk of myocardial infarction. J Am Coll Cardiol 2012; 60:290-9. [PMID: 22813605 DOI: 10.1016/j.jacc.2012.03.056] [Citation(s) in RCA: 375] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Revised: 03/02/2012] [Accepted: 03/12/2012] [Indexed: 01/13/2023]
Abstract
OBJECTIVES This study sought to explore the association between baseline levels of microRNAs (miRNAs) (1995) and incident myocardial infarction (1995 to 2005) in the Bruneck cohort and determine their cellular origin. BACKGROUND Circulating miRNAs are emerging as potential biomarkers. We previously identified an miRNA signature for type 2 diabetes in the general population. METHODS A total of 19 candidate miRNAs were quantified by real-time polymerase chain reactions in 820 participants. RESULTS In multivariable Cox regression analysis, 3 miRNAs were consistently and significantly related to incident myocardial infarction: miR-126 showed a positive association (multivariable hazard ratio: 2.69 [95% confidence interval: 1.45 to 5.01], p = 0.002), whereas miR-223 and miR-197 were inversely associated with disease risk (multivariable hazard ratio: 0.47 [95% confidence interval: 0.29 to 0.75], p = 0.002, and 0.56 [95% confidence interval: 0.32 to 0.96], p = 0.036). To determine their cellular origin, healthy volunteers underwent limb ischemia-reperfusion generated by thigh cuff inflation, and plasma miRNA changes were analyzed at baseline, 10 min, 1 h, 5 h, 2 days, and 7 days. Computational analysis using the temporal clustering by affinity propagation algorithm identified 6 distinct miRNA clusters. One cluster included all miRNAs associated with the risk of future myocardial infarction. It was characterized by early (1 h) and sustained activation (7 days) post-ischemia-reperfusion injury and consisted of miRNAs predominantly expressed in platelets. CONCLUSIONS In subjects with subsequent myocardial infarction, differential co-expression patterns of circulating miRNAs occur around endothelium-enriched miR-126, with platelets being a major contributor to this miRNA signature.
Collapse
Affiliation(s)
- Anna Zampetaki
- King's British Heart Foundation Centre, King's College London, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Cardille JA, White JC, Wulder MA, Holland T. Representative landscapes in the forested area of Canada. ENVIRONMENTAL MANAGEMENT 2012; 49:163-173. [PMID: 22109729 PMCID: PMC3249557 DOI: 10.1007/s00267-011-9785-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 10/26/2011] [Indexed: 05/31/2023]
Abstract
Canada is a large nation with forested ecosystems that occupy over 60% of the national land base, and knowledge of the patterns of Canada's land cover is important to proper environmental management of this vast resource. To this end, a circa 2000 Landsat-derived land cover map of the forested ecosystems of Canada has created a new window into understanding the composition and configuration of land cover patterns in forested Canada. Strategies for summarizing such large expanses of land cover are increasingly important, as land managers work to study and preserve distinctive areas, as well as to identify representative examples of current land-cover and land-use assemblages. Meanwhile, the development of extremely efficient clustering algorithms has become increasingly important in the world of computer science, in which billions of pieces of information on the internet are continually sifted for meaning for a vast variety of applications. One recently developed clustering algorithm quickly groups large numbers of items of any type in a given data set while simultaneously selecting a representative-or "exemplar"-from each cluster. In this context, the availability of both advanced data processing methods and a nationally available set of landscape metrics presents an opportunity to identify sets of representative landscapes to better understand landscape pattern, variation, and distribution across the forested area of Canada. In this research, we first identify and provide context for a small, interpretable set of exemplar landscapes that objectively represent land cover in each of Canada's ten forested ecozones. Then, we demonstrate how this approach can be used to identify flagship and satellite long-term study areas inside and outside protected areas in the province of Ontario. These applications aid our understanding of Canada's forest while augmenting its management toolbox, and may signal a broad range of applications for this versatile approach.
Collapse
Affiliation(s)
- Jeffrey A Cardille
- Department of Geography, University of Montreal, 520 Chemin Cote Ste Catherine Outremont, Montreal H2V 2B8, Quebec, Canada.
| | | | | | | |
Collapse
|
25
|
Penfold CA, Wild DL. How to infer gene networks from expression profiles, revisited. Interface Focus 2011; 1:857-70. [PMID: 23226586 DOI: 10.1098/rsfs.2011.0053] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 07/12/2011] [Indexed: 01/17/2023] Open
Abstract
Inferring the topology of a gene-regulatory network (GRN) from genome-scale time-series measurements of transcriptional change has proved useful for disentangling complex biological processes. To address the challenges associated with this inference, a number of competing approaches have previously been used, including examples from information theory, Bayesian and dynamic Bayesian networks (DBNs), and ordinary differential equation (ODE) or stochastic differential equation. The performance of these competing approaches have previously been assessed using a variety of in silico and in vivo datasets. Here, we revisit this work by assessing the performance of more recent network inference algorithms, including a novel non-parametric learning approach based upon nonlinear dynamical systems. For larger GRNs, containing hundreds of genes, these non-parametric approaches more accurately infer network structures than do traditional approaches, but at significant computational cost. For smaller systems, DBNs are competitive with the non-parametric approaches with respect to computational time and accuracy, and both of these approaches appear to be more accurate than Granger causality-based methods and those using simple ODEs models.
Collapse
|
26
|
Bodenhofer U, Kothmeier A, Hochreiter S. APCluster: an R package for affinity propagation clustering. Bioinformatics 2011; 27:2463-4. [PMID: 21737437 DOI: 10.1093/bioinformatics/btr406] [Citation(s) in RCA: 246] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Affinity propagation (AP) clustering has recently gained increasing popularity in bioinformatics. AP clustering has the advantage that it allows for determining typical cluster members, the so-called exemplars. We provide an R implementation of this promising new clustering technique to account for the ubiquity of R in bioinformatics. This article introduces the package and presents an application from structural biology. AVAILABILITY The R package apcluster is available via CRAN-The Comprehensive R Archive Network: http://cran.r-project.org/web/packages/apcluster CONTACT apcluster@bioinf.jku.at; bodenhofer@bioinf.jku.at.
Collapse
Affiliation(s)
- Ulrich Bodenhofer
- Institute of Bioinformatics, Johannes Kepler University, Linz, Austria.
| | | | | |
Collapse
|
27
|
Gorte M, Horstman A, Page RB, Heidstra R, Stromberg A, Boutilier K. Microarray-based identification of transcription factor target genes. Methods Mol Biol 2011; 754:119-41. [PMID: 21720950 DOI: 10.1007/978-1-61779-154-3_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Microarray analysis is widely used to identify transcriptional changes associated with genetic perturbation or signaling events. Here we describe its application in the identification of plant transcription factor target genes with emphasis on the design of suitable DNA constructs for controlling TF activity, the experimental setup, the statistical analysis of the microarray data, and the validation of target genes.
Collapse
Affiliation(s)
- Maartje Gorte
- Molecular Genetics Group, Department of Biology, Faculty of Science, Utrecht University, Utrecht, The Netherlands.
| | | | | | | | | | | |
Collapse
|
28
|
Verhage A, van Wees SC, Pieterse CM. Plant immunity: it's the hormones talking, but what do they say? PLANT PHYSIOLOGY 2010; 154:536-40. [PMID: 20921180 PMCID: PMC2949039 DOI: 10.1104/pp.110.161570] [Citation(s) in RCA: 188] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2010] [Accepted: 06/25/2010] [Indexed: 05/18/2023]
|
29
|
Chang X, Liu S, Yu YT, Li YX, Li YY. Identifying modules of coexpressed transcript units and their organization of Saccharopolyspora erythraea from time series gene expression profiles. PLoS One 2010; 5:e12126. [PMID: 20711345 PMCID: PMC2920828 DOI: 10.1371/journal.pone.0012126] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2010] [Accepted: 07/14/2010] [Indexed: 12/18/2022] Open
Abstract
Background The Saccharopolyspora erythraea genome sequence was released in 2007. In order to look at the gene regulations at whole transcriptome level, an expression microarray was specifically designed on the S. erythraea strain NRRL 2338 genome sequence. Based on these data, we set out to investigate the potential transcriptional regulatory networks and their organization. Methodology/Principal Findings In view of the hierarchical structure of bacterial transcriptional regulation, we constructed a hierarchical coexpression network at whole transcriptome level. A total of 27 modules were identified from 1255 differentially expressed transcript units (TUs) across time course, which were further classified in to four groups. Functional enrichment analysis indicated the biological significance of our hierarchical network. It was indicated that primary metabolism is activated in the first rapid growth phase (phase A), and secondary metabolism is induced when the growth is slowed down (phase B). Among the 27 modules, two are highly correlated to erythromycin production. One contains all genes in the erythromycin-biosynthetic (ery) gene cluster and the other seems to be associated with erythromycin production by sharing common intermediate metabolites. Non-concomitant correlation between production and expression regulation was observed. Especially, by calculating the partial correlation coefficients and building the network based on Gaussian graphical model, intrinsic associations between modules were found, and the association between those two erythromycin production-correlated modules was included as expected. Conclusions This work created a hierarchical model clustering transcriptome data into coordinated modules, and modules into groups across the time course, giving insight into the concerted transcriptional regulations especially the regulation corresponding to erythromycin production of S. erythraea. This strategy may be extendable to studies on other prokaryotic microorganisms.
Collapse
Affiliation(s)
- Xiao Chang
- Key Lab of Systems Biology, Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
- Graduate School of the Chinese Academy of Sciences, Beijing, China
| | - Shuai Liu
- Test Center for Agriculture Quality of Jinan, Jinan, Shandong, China
| | - Yong-Tao Yu
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yi-Xue Li
- Key Lab of Systems Biology, Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
- * E-mail: (YXL); (YYL)
| | - Yuan-Yuan Li
- Key Lab of Systems Biology, Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
- * E-mail: (YXL); (YYL)
| |
Collapse
|