1
|
Sánchez-Baizán N, Jarne-Sanz I, Roco ÁS, Schartl M, Piferrer F. Extraordinary variability in gene activation and repression programs during gonadal sex differentiation across vertebrates. Front Cell Dev Biol 2024; 12:1328365. [PMID: 38322165 PMCID: PMC10844511 DOI: 10.3389/fcell.2024.1328365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/11/2024] [Indexed: 02/08/2024] Open
Abstract
Genes involved in gonadal sex differentiation have been traditionally thought to be fairly conserved across vertebrates, but this has been lately questioned. Here, we performed the first comparative analysis of gonadal transcriptomes across vertebrates, from fish to mammals. Our results unambiguously show an extraordinary overall variability in gene activation and repression programs without a phylogenetic pattern. During sex differentiation, genes such as dmrt1, sox9, amh, cyp19a and foxl2 were consistently either male- or female-enriched across species while many genes with the greatest expression change within each sex were not. We also found that downregulation in the opposite sex, which had only been quantified in the mouse model, was also prominent in the rest of vertebrates. Finally, we report 16 novel conserved markers (e.g., fshr and dazl) and 11 signaling pathways. We propose viewing vertebrate gonadal sex differentiation as a hierarchical network, with conserved hub genes such as sox9 and amh alongside less connected and less conserved nodes. This proposed framework implies that evolutionary pressures may impact genes based on their level of connectivity.
Collapse
Affiliation(s)
- Núria Sánchez-Baizán
- Institut de Ciències del Mar (ICM), Spanish National Research Council (CSIC), Barcelona, Spain
| | - Ignasi Jarne-Sanz
- Institut de Ciències del Mar (ICM), Spanish National Research Council (CSIC), Barcelona, Spain
| | - Álvaro S. Roco
- Developmental Biochemistry, Biocenter, University of Wuerzburg, Wuerzburg, Germany
- Department of Experimental Biology, Faculty of Experimental Sciences, University of Jaén, Jaén, Spain
| | - Manfred Schartl
- Developmental Biochemistry, Biocenter, University of Wuerzburg, Wuerzburg, Germany
- Xiphophorus Genetic Stock Center, Texas State University, San Marcos, TX, United States
| | - Francesc Piferrer
- Institut de Ciències del Mar (ICM), Spanish National Research Council (CSIC), Barcelona, Spain
| |
Collapse
|
2
|
Sillanpää MJ, Pikkuhookana P, Abrahamsson S, Knürr T, Fries A, Lerceteau E, Waldmann P, García-Gil MR. Simultaneous estimation of multiple quantitative trait loci and growth curve parameters through hierarchical Bayesian modeling. Heredity (Edinb) 2011; 108:134-46. [PMID: 21792229 DOI: 10.1038/hdy.2011.56] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
A novel hierarchical quantitative trait locus (QTL) mapping method using a polynomial growth function and a multiple-QTL model (with no dependence in time) in a multitrait framework is presented. The method considers a population-based sample where individuals have been phenotyped (over time) with respect to some dynamic trait and genotyped at a given set of loci. A specific feature of the proposed approach is that, instead of an average functional curve, each individual has its own functional curve. Moreover, each QTL can modify the dynamic characteristics of the trait value of an individual through its influence on one or more growth curve parameters. Apparent advantages of the approach include: (1) assumption of time-independent QTL and environmental effects, (2) alleviating the necessity for an autoregressive covariance structure for residuals and (3) the flexibility to use variable selection methods. As a by-product of the method, heritabilities and genetic correlations can also be estimated for individual growth curve parameters, which are considered as latent traits. For selecting trait-associated loci in the model, we use a modified version of the well-known Bayesian adaptive shrinkage technique. We illustrate our approach by analysing a sub sample of 500 individuals from the simulated QTLMAS 2009 data set, as well as simulation replicates and a real Scots pine (Pinus sylvestris) data set, using temporal measurements of height as dynamic trait of interest.
Collapse
Affiliation(s)
- M J Sillanpää
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
| | | | | | | | | | | | | | | |
Collapse
|
3
|
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M. Mapping complex disease traits with global gene expression. Nat Rev Genet 2009; 10:184-94. [PMID: 19223927 PMCID: PMC4550035 DOI: 10.1038/nrg2537] [Citation(s) in RCA: 613] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Variation in gene expression is an important mechanism underlying susceptibility to complex disease. The simultaneous genome-wide assay of gene expression and genetic variation allows the mapping of the genetic factors that underpin individual differences in quantitative levels of expression (expression QTLs; eQTLs). The availability of systematically generated eQTL information could provide immediate insight into a biological basis for disease associations identified through genome-wide association (GWA) studies, and can help to identify networks of genes involved in disease pathogenesis. Although there are limitations to current eQTL maps, understanding of disease will be enhanced with novel technologies and international efforts that extend to a wide range of new samples and tissues.
Collapse
Affiliation(s)
- William Cookson
- National Heart and Lung Institute, Imperial College London, SW3 6LY, England
| | - Liming Liang
- Center for Statistical Genetics, Dept. of Biostatistics, SPH II, Ann Arbor, MI 48109-2029, USA
| | - Gonçalo Abecasis
- Center for Statistical Genetics, Dept. of Biostatistics, SPH II, Ann Arbor, MI 48109-2029, USA
| | - Miriam Moffatt
- National Heart and Lung Institute, Imperial College London, SW3 6LY, England
| | - Mark Lathrop
- CEA/Centre National de Genotypage, 91057 Evry, France
| |
Collapse
|
4
|
Chen G, Dai Y. A new distance measurement for clustering time-course gene expression data. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:2929-32. [PMID: 17270891 DOI: 10.1109/iembs.2004.1403832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The purpose of this paper is two-fold. First, a new distance measurement is proposed for temporal microarray gene expression data based on the angles of line segments in the curve of each individual gene expression profile. The hierarchical agglomerative clustering methods are used to incorporate this distance definition. Second, the assessment of the quality of clusterings obtained from the methods are provided by the use of the Davies-Bouldin validity index (DBI). We conclude that the DBI may not be an appropriate indicator for the quality assessment of clusters for time-course gene expression data. We provide an alternative DBI based on the normalized Pearson correlation for this purpose.
Collapse
Affiliation(s)
- Guanrao Chen
- Dept. of Comput. Sci., Illinois Univ., Chicago, IL, USA
| | | |
Collapse
|
5
|
Turchin A, Guo CZ, Adler GK, Ricchiuti V, Kohane IS, Williams GH. Effect of acute aldosterone administration on gene expression profile in the heart. Endocrinology 2006; 147:3183-9. [PMID: 16601137 DOI: 10.1210/en.2005-1674] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Aldosterone is known to have a number of direct adverse effects on the heart, including fibrosis and myocardial inflammation. However, genetic mechanisms of aldosterone action on the heart remain unclear. This paper describes an investigation of temporal changes in gene expression profile of the whole heart induced by acute administration of a physiologic dose of aldosterone in the mouse. mRNA levels of 34,000 known mouse genes were measured at eight time points after aldosterone administration using oligonucleotide microarrays and compared with those of the control animals who underwent a sham injection. A novel software tool (CAGED) designed for analysis of temporal microarray experiments using a Bayesian approach was used to identify genes differentially expressed between the aldosterone-injected and control group. CAGED analysis identified 12 genes as having significant differences in their temporal profiles between aldosterone-injected and control groups. All of these genes exhibited a decrease in expression level 1-3 h after aldosterone injection followed by a brief rebound and a return to baseline. These findings were validated by quantitative RT-PCR. The differentially expressed genes included phosphatases, regulators of steroid biosynthesis, inactivators of reactive oxygen species, and structural proteins. Several of these genes are known to functionally mediate biochemical phenomena previously observed to be triggered by aldosterone administration, such as phosphorylation of ERK1/2. These results provide the first description of cardiac genetic response to aldosterone and identify several potential mediators of known biochemical sequelae of aldosterone administration in the heart.
Collapse
Affiliation(s)
- Alexander Turchin
- Division of Endocrinology, Brigham and Women's Hospital, 221 Longwood Avenue, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | |
Collapse
|
6
|
Ferrazzi F, Magni P, Bellazzi R. Random Walk Models for Bayesian Clustering of Gene Expression Profiles. ACTA ACUST UNITED AC 2005; 4:263-76. [PMID: 16309344 DOI: 10.2165/00822942-200504040-00006] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The analysis of gene expression temporal profiles is a topic of increasing interest in functional genomics. Model-based clustering methods are particularly interesting because they are able to capture the dynamic nature of these data and to identify the optimal number of clusters. We have defined a new Bayesian method that allows us to cope with some important issues that remain unsolved in the currently available approaches: the presence of time dislocations in gene expression, the non-stationarity of the processes generating the data, and the presence of data collected on an irregular temporal grid. Our method, which is based on random walk models, requires only mild a priori assumptions about the nature of the processes generating the data and explicitly models inter-gene variability within each cluster. It has first been validated on simulated datasets and then employed for the analysis of a dataset relative to serum-stimulated fibroblasts. In all cases, the results have been promising, showing that the method can be helpful in functional genomics research.
Collapse
Affiliation(s)
- Fulvia Ferrazzi
- Dipartimento di Informatica e Sistemistica, Università di Pavia, Pavia, Italy
| | | | | |
Collapse
|
7
|
Barry A, Holmes J, Llorà X. Data Mining using Learning Classifier Systems. APPLICATIONS OF LEARNING CLASSIFIER SYSTEMS 2004. [DOI: 10.1007/978-3-540-39925-4_2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
8
|
Sarang SS, Yoshida T, Cadet R, Valeras AS, Jensen RV, Gullans SR. Discovery of molecular mechanisms of neuroprotection using cell-based bioassays and oligonucleotide arrays. Physiol Genomics 2002; 11:45-52. [PMID: 12388792 DOI: 10.1152/physiolgenomics.00064.2002] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Oxidative injury and the resulting death of neurons is a major pathological factor involved in numerous neurodegenerative diseases. However, the development of drugs that target this mechanism remains limited. The goal of this study was to test a compound library of approved Food and Drug Administration drugs against a hydrogen peroxide-induced oxidant injury model in neuroblastoma cells. We identified 26 neuroprotective compounds, of which megestrol, meclizine, verapamil, methazolamide, sulindac, and retinol were examined in greater detail. Using large-scale oligonucleotide microarray analysis, we identified genes modulated by these drugs that might underlie the cytoprotection. Five key genes were either uniformly upregulated or downregulated by all six drug treatments, namely, tissue inhibitor of matrix metalloproteinase (TIMP1), ret-proto-oncogene, clusterin, galanin, and growth associated protein (GAP43). Exogenous addition of the neuropeptide galanin alone conferred survival to oxidant-stressed cells, comparable to that seen with the drugs. Our approach, which we term "interventional profiling," represents a general and powerful strategy for identifying new bioactive agents for any biological process, as well as identifying key downstream genes and pathways that are involved.
Collapse
Affiliation(s)
- Satinder S Sarang
- Biotechnology Center, Center for Neurologic Diseases, Brigham and Women's Hospital, Harvard Medical School, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | |
Collapse
|
9
|
Fofanov Y, Pettitt BM. Reconstruction of the genetic regulatory dynamics of the rat spinal cord development: local invariants approach. J Biomed Inform 2002; 35:343-51. [PMID: 12968783 DOI: 10.1016/s1532-0464(03)00035-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Recently, many attempts have been made to describe the gene expression temporal dynamics by using systems of differential equations. This is fraught with difficulty, given the current experimental level of understanding. Another way to extract useful information regarding regulation in genetic networks can be provided by our method of Incomplete Modeling using Local Invariants, although at the price of not being able to construct a complete model of the whole system. In this approach we are looking for a set of simple models describing the algebraic or differential relations among just a few variables, genes in this case, which fit the experimental data with the required accuracy. In the present work, we apply this method to gene expression time profiles of 112 genes from rat spinal cord development experiments. We found that many different types of Local Invariants exist in this dataset. Moreover, some isolated self-contained subsystems, whose behavior can be described by closed systems of differential equations, were also found.
Collapse
Affiliation(s)
- Yuriy Fofanov
- Bioinformatics Research Team, Molecular Therapy Research Center, Department of Computer Science, University of Houston, 501 PGH Hall, 4800 Calhoun Road, Houston, TX 77204-3010, USA.
| | | |
Collapse
|
10
|
Ramoni MF, Sebastiani P, Kohane IS. Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A 2002; 99:9121-6. [PMID: 12082179 PMCID: PMC123104 DOI: 10.1073/pnas.132656399] [Citation(s) in RCA: 265] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2001] [Indexed: 11/18/2022] Open
Abstract
This article presents a Bayesian method for model-based clustering of gene expression dynamics. The method represents gene-expression dynamics as autoregressive equations and uses an agglomerative procedure to search for the most probable set of clusters given the available data. The main contributions of this approach are the ability to take into account the dynamic nature of gene expression time series during clustering and a principled way to identify the number of distinct clusters. As the number of possible clustering models grows exponentially with the number of observed time series, we have devised a distance-based heuristic search procedure able to render the search process feasible. In this way, the method retains the important visualization capability of traditional distance-based clustering and acquires an independent, principled measure to decide when two series are different enough to belong to different clusters. The reliance of this method on an explicit statistical representation of gene expression dynamics makes it possible to use standard statistical techniques to assess the goodness of fit of the resulting model and validate the underlying assumptions. A set of gene-expression time series, collected to study the response of human fibroblasts to serum, is used to identify the properties of the method.
Collapse
Affiliation(s)
- Marco F Ramoni
- Children's Hospital Informatics Program, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| | | | | |
Collapse
|
11
|
Abstract
Pharmacogenomics requires the integration and analysis of genomic, molecular, cellular, and clinical data, and it thus offers a remarkable set of challenges to biomedical informatics. These include infrastructural challenges such as the creation of data models and databases for storing these data, the integration of these data with external databases, the extraction of information from natural language text, and the protection of databases with sensitive information. There are also scientific challenges in creating tools to support gene expression analysis, three-dimensional structural analysis, and comparative genomic analysis. In this review, we summarize the current uses of informatics within pharmacogenomics and show how the technical challenges that remain for biomedical informatics are typical of those that will be confronted in the postgenomic era.
Collapse
Affiliation(s)
- Russ B Altman
- Stanford Medical Informatics, Stanford, California 94305-5479, USA.
| | | |
Collapse
|
12
|
Butte AJ, Bao L, Reis BY, Watkins TW, Kohane IS. Comparing the similarity of time-series gene expression using signal processing metrics. J Biomed Inform 2001; 34:396-405. [PMID: 12198759 DOI: 10.1006/jbin.2002.1037] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Many algorithms have been used to cluster genes measured by microarray across a time series. Instead of clustering, our goal was to compare all pairs of genes to determine whether there was evidence of a phase shift between them. We describe a technique where gene expression is treated as a discrete time-invariant signal, allowing the use of digital signal-processing tools, including power spectral density, coherence, and transfer gain and phase shift. We used these on a public RNA expression set of 2467 genes measured every 7 min for 119 min and found 18 putative associations. Two of these were known in the biomedical literature and may have been missed using correlation coefficients. Digital signal processing tools can be embedded and enhance existing clustering algorithms.
Collapse
Affiliation(s)
- A J Butte
- Children's Hospital Informatics Program, 300 Longwood Avenue, Boston, Massachusetts 02115, USA.
| | | | | | | | | |
Collapse
|
13
|
Liu H, Lussier YA, Friedman C. Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method. J Biomed Inform 2001; 34:249-61. [PMID: 11977807 DOI: 10.1006/jbin.2001.1023] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.
Collapse
Affiliation(s)
- H Liu
- Computer Science Division, Graduate School and University Center, City University of New York, New York, New York 10016, USA.
| | | | | |
Collapse
|