1
|
Principal Component Analysis applied directly to Sequence Matrix. Sci Rep 2019; 9:19297. [PMID: 31848355 PMCID: PMC6917774 DOI: 10.1038/s41598-019-55253-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 11/23/2019] [Indexed: 11/08/2022] Open
Abstract
Sequence data is now widely used to observe relationships among organisms. However, understanding structure of the qualitative data is challenging. Conventionally, the relationships are analysed using a dendrogram that estimates a tree shape. This approach has difficulty in verifying the appropriateness of the tree shape; rather, horizontal gene transfers and mating can make the shape of the relationship as networks. As a connection-free approach, principal component analysis (PCA) is used to summarize the distance matrix, which records distances between each combination of samples. However, this approach is limited regarding the treatment of information of sequence motifs; distances caused by different motifs are mixed up. This hides clues to figure out how the samples are different. As any bases may change independently, a sequence is multivariate data essentially. Hence, differences among samples and bases that contribute to the difference should be observed coincidentally. To archive this, the sequence matrix is transferred to boolean vector and directly analysed by using PCA. The effects are confirmed in diversity of Asiatic lion and human as well as environmental DNA. Resolution of samples and robustness of calculation is improved. Relationship of a direction of difference and causative nucleotides has become obvious at a glance.
Collapse
|
2
|
Struck TH, Wey-Fabrizius AR, Golombek A, Hering L, Weigert A, Bleidorn C, Klebow S, Iakovenko N, Hausdorf B, Petersen M, Kück P, Herlyn H, Hankeln T. Platyzoan paraphyly based on phylogenomic data supports a noncoelomate ancestry of spiralia. Mol Biol Evol 2014; 31:1833-49. [PMID: 24748651 DOI: 10.1093/molbev/msu143] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Based on molecular data three major clades have been recognized within Bilateria: Deuterostomia, Ecdysozoa, and Spiralia. Within Spiralia, small-sized and simply organized animals such as flatworms, gastrotrichs, and gnathostomulids have recently been grouped together as Platyzoa. However, the representation of putative platyzoans was low in the respective molecular phylogenetic studies, in terms of both, taxon number and sequence data. Furthermore, increased substitution rates in platyzoan taxa raised the possibility that monophyletic Platyzoa represents an artifact due to long-branch attraction. In order to overcome such problems, we employed a phylogenomic approach, thereby substantially increasing 1) the number of sampled species within Platyzoa and 2) species-specific sequence coverage in data sets of up to 82,162 amino acid positions. Using established and new measures (long-branch score), we disentangled phylogenetic signal from misleading effects such as long-branch attraction. In doing so, our phylogenomic analyses did not recover a monophyletic origin of platyzoan taxa that, instead, appeared paraphyletic with respect to the other spiralians. Platyhelminthes and Gastrotricha formed a monophylum, which we name Rouphozoa. To the exclusion of Gnathifera, Rouphozoa and all other spiralians represent a monophyletic group, which we name Platytrochozoa. Platyzoan paraphyly suggests that the last common ancestor of Spiralia was a simple-bodied organism lacking coelomic cavities, segmentation, and complex brain structures, and that more complex animals such as annelids evolved from such a simply organized ancestor. This conclusion contradicts alternative evolutionary scenarios proposing an annelid-like ancestor of Bilateria and Spiralia and several independent events of secondary reduction.
Collapse
Affiliation(s)
- Torsten H Struck
- Zoological Research Museum Alexander Koenig, Bonn, GermanyUniversity of Osnabrück, FB05 Biology/Chemistry, AG Zoology, Osnabrück, Germany
| | - Alexandra R Wey-Fabrizius
- Institute of Molecular Genetics, Biosafety Research and Consulting, Johannes Gutenberg University, Mainz, Germany
| | - Anja Golombek
- Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Lars Hering
- Animal Evolution and Development, Institute of Biology II, University of Leipzig, Leipzig, Germany
| | - Anne Weigert
- Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
| | - Christoph Bleidorn
- Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
| | - Sabrina Klebow
- Institute of Molecular Genetics, Biosafety Research and Consulting, Johannes Gutenberg University, Mainz, Germany
| | - Nataliia Iakovenko
- Department of Biology and Ecology, Ostravian University in Ostrava, Ostrava, Czech RepublicDepartment of Invertebrate Fauna and Systematics, Schmalhausen Institute of Zoology NAS of Ukraine, Kyiv, Ukraine
| | | | - Malte Petersen
- Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Patrick Kück
- Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Holger Herlyn
- Institute of Anthropology, Johannes Gutenberg University, Mainz, Germany
| | - Thomas Hankeln
- Institute of Molecular Genetics, Biosafety Research and Consulting, Johannes Gutenberg University, Mainz, Germany
| |
Collapse
|
3
|
Montano V, Marcari V, Pavanello M, Anyaele O, Comas D, Destro-Bisol G, Batini C. The influence of habitats on female mobility in Central and Western Africa inferred from human mitochondrial variation. BMC Evol Biol 2013; 13:24. [PMID: 23360301 PMCID: PMC3605107 DOI: 10.1186/1471-2148-13-24] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Accepted: 01/25/2013] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND When studying the genetic structure of human populations, the role of cultural factors may be difficult to ascertain due to a lack of formal models. Linguistic diversity is a typical example of such a situation. Patrilocality, on the other hand, can be integrated into a biological framework, allowing the formulation of explicit working hypotheses. The present study is based on the assumption that patrilocal traditions make the hypervariable region I of the mtDNA a valuable tool for the exploration of migratory dynamics, offering the opportunity to explore the relationships between genetic and linguistic diversity. We studied 85 Niger-Congo-speaking patrilocal populations that cover regions from Senegal to Central African Republic. A total of 4175 individuals were included in the study. RESULTS By combining a multivariate analysis aimed at investigating the population genetic structure, with a Bayesian approach used to test models and extent of migration, we were able to detect a stepping-stone migration model as the best descriptor of gene flow across the region, with the main discontinuities corresponding to forested areas. CONCLUSIONS Our analyses highlight an aspect of the influence of habitat variation on human genetic diversity that has yet to be understood. Rather than depending simply on geographic linear distances, patterns of female genetic variation vary substantially between savannah and rainforest environments. Our findings may be explained by the effects of recent gene flow constrained by environmental factors, which superimposes on a background shaped by pre-agricultural peopling.
Collapse
Affiliation(s)
- Valeria Montano
- Dipartimento di Biologia Ambientale, Sapienza Università di Roma, P.le Aldo Moro 5, 00185, Rome, Italy.
| | | | | | | | | | | | | |
Collapse
|
5
|
Kato T, Fuku N, Noguchi Y, Murakami H, Miyachi M, Kimura Y, Tanaka M, Kitamura K. Mitochondrial DNA haplogroup associated with hereditary hearing loss in a Japanese population. Acta Otolaryngol 2012; 132:1178-82. [PMID: 22830575 DOI: 10.3109/00016489.2012.693624] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
CONCLUSION Haplogroup D4b, especially subhaplogroup D4b2, may be one of the modifiers associated with the phenotypic expression of hereditary hearing loss (HL). OBJECTIVES The present study investigated the association between suspected hereditary HL and 12 major mtDNA haplogroups in a Japanese population. Besides the mutations of mitochondrial DNA, many modifiers including environmental factors and genetic polymorphisms are involved in HL. METHODS The subjects comprised 373 unrelated Japanese patients with suspected hereditary HL and 480 controls. Twenty of the 373 patients were excluded from the study because the m.1555A>G or the m.3243A>G mutation had been detected in them. The mitochondrial haplotypes were classified into 12 major Japanese haplogroups (i.e. F, B, A, N9a, N9b, M7a, M7b, G1, G2, D4a, D4b, and D5). The frequency of each haplogroup in patients with HL was compared with that of the controls using the chi-squared test. RESULTS The frequency of the HL patients carrying the mitochondrial haplogroup D4b was significantly higher than that of the controls (37/353 [10.5%] vs 31/480 [6.5%]; OR 1.70 [95% CI 1.03-2.79, p = 0.036]) and evidence for enhancement was found in subhaplogroup D4b2 (32/353 [9.1%] vs 24/480 [5%], OR 1.89 [95% CI 1.09-3.28, p = 0.021]).
Collapse
Affiliation(s)
- Tomofumi Kato
- Otolaryngology, Tokyo Metropolitan Geriatric Hospital, Tokyo, Japan.
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Seiler M, Huang CC, Szalma S, Bhanot G. ConsensusCluster: a software tool for unsupervised cluster discovery in numerical data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:109-13. [PMID: 20141333 DOI: 10.1089/omi.2009.0083] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We have created a stand-alone software tool, ConsensusCluster, for the analysis of high-dimensional single nucleotide polymorphism (SNP) and gene expression microarray data. Our software implements the consensus clustering algorithm and principal component analysis to stratify the data into a given number of robust clusters. The robustness is achieved by combining clustering results from data and sample resampling as well as by averaging over various algorithms and parameter settings to achieve accurate, stable clustering results. We have implemented several different clustering algorithms in the software, including K-Means, Partition Around Medoids, Self-Organizing Map, and Hierarchical clustering methods. After clustering the data, ConsensusCluster generates a consensus matrix heatmap to give a useful visual representation of cluster membership, and automatically generates a log of selected features that distinguish each pair of clusters. ConsensusCluster gives more robust and more reliable clusters than common software packages and, therefore, is a powerful unsupervised learning tool that finds hidden patterns in data that might shed light on its biological interpretation. This software is free and available from http://code.google.com/p/consensus-cluster .
Collapse
Affiliation(s)
- Michael Seiler
- BioMaPS Institute, Rutgers University, Piscataway, New Jersey 08854, USA
| | | | | | | |
Collapse
|
7
|
Solovyov A, Palacios G, Briese T, Lipkin WI, Rabadan R. Cluster analysis of the origins of the new influenza A(H1N1) virus. Euro Surveill 2009; 14:19224. [PMID: 19480812 PMCID: PMC4310691 DOI: 10.2807/ese.14.21.19224-en] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In March and April 2009, a new strain of influenza A(H1N1) virus has been isolated in Mexico and the United States. Since the initial reports more than 10,000 cases have been reported to the World Health Organization, all around the world. Several hundred isolates have already been sequenced and deposited in public databases. We have studied the genetics of the new strain and identified its closest relatives through a cluster analysis approach. We show that the new virus combines genetic information related to different swine influenza viruses. Segments PB2, PB1, PA, HA, NP and NS are related to swine H1N2 and H3N2 influenza viruses isolated in North America. Segments NA and M are related to swine influenza viruses isolated in Eurasia.
Collapse
Affiliation(s)
- A Solovyov
- Physics Department, Princeton University, Princeton, United States
| | - G Palacios
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, United States
| | - T Briese
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, United States
| | - W I Lipkin
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, United States
| | - R Rabadan
- Department of Biomedical Informatics, Center for Computational Biology and Bioinformatics, Columbia University College of Physicians and Surgeons, New York, United States
| |
Collapse
|