1
|
Santus L, Garriga E, Deorowicz S, Gudyś A, Notredame C. Towards the accurate alignment of over a million protein sequences: Current state of the art. Curr Opin Struct Biol 2023; 80:102577. [PMID: 37012200 DOI: 10.1016/j.sbi.2023.102577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/21/2023] [Accepted: 02/27/2023] [Indexed: 04/04/2023]
Abstract
Large-scale genomics requires highly scalable and accurate multiple sequence alignment methods. Results collected over this last decade suggest accuracy loss when scaling up over a few thousand sequences. This issue has been actively addressed with a number of innovative algorithmic solutions that combine low-level hardware optimization with novel higher-level heuristics. This review provides an extensive critical overview of these recent methods. Using established reference datasets we conclude that albeit significant progress has been achieved, a unified framework able to consistently and efficiently produce high-accuracy large-scale multiple alignments is still lacking.
Collapse
|
2
|
He J, He Z, Yang D, Ma Z, Chen H, Zhang Q, Deng F, Ye L, Pu Y, Zhang M, Yang S, Yang S, Yan T. Genetic Variation in Schizothorax kozlovi Nikolsky in the Upper Reaches of the Chinese Yangtze River Based on Genotyping for Simplified Genome Sequencing. Animals (Basel) 2022; 12:ani12172181. [PMID: 36077902 PMCID: PMC9454844 DOI: 10.3390/ani12172181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/16/2022] [Accepted: 08/19/2022] [Indexed: 11/23/2022] Open
Abstract
Simple Summary Schizothorax kozlovi Nikolsky is a unique cold−water fish in the upper reaches of the Yangtze River in China and has high economic value. In our study, genetic diversity and population structure analyses were performed on seven wild populations in the upper reaches of the Yangtze River by GBS. The above results indicate that the populations of S. kozlovi have different degrees of tolerance and selection pressure in response to temperature and altitude. The Wujiang population was genetically differentiated from the Jinsha River and Yalong River populations. The Wujiang intrapopulation has greater genetic diversity and differentiation than the Jinsha River and Yalong River populations, which demonstrates that the Jinsha and Yalong populations require more attention and resources for their protection. The results of this study will increase our understanding of the diversity of S. kozlovi in the upper reaches of the Yangtze River and provide a basis for the conservation and utilization of wild resources. Abstract Schizothorax kozlovi Nikolsky is a unique cold−water fish in the upper reaches of the Yangtze River in China and has high economic value. In our study, genetic diversity and population structure analyses were performed on seven wild populations (originating from the Jinsha River, Yalong River, and Wujiang River) in the upper reaches of the Yangtze River by genotyping by sequencing (GBS). The results indicated that a total of 303,970 single−nucleotide polymorphisms (SNPs) were identified from the seven wild populations. Lower genetic diversity was exhibited among the intrapopulations of the three tributaries, and the Wujiang River population had significant genetic differentiation when compared to the Jinsha River and Yalong River populations. Furthermore, the selected SNPs were enriched in cellular processes, environmental adaptation, signal transduction, and related metabolic processes between the Wujiang population and the other two populations. The above results indicate that the populations of S. kozlovi have different degrees of tolerance and selection pressure in response to temperature and altitude. The Wujiang intrapopulation has greater genetic diversity and differentiation than the Jinsha River and Yalong River populations, which demonstrates that the Jinsha and Yalong populations require more attention and resources for their protection. The results of this study will increase our understanding of the diversity of S. kozlovi in the upper reaches of the Yangtze River and provide a basis for the conservation and utilization of wild resources.
Collapse
|
3
|
Kioukis A, Pourjam M, Neuhaus K, Lagkouvardos I. Taxonomy Informed Clustering, an Optimized Method for Purer and More Informative Clusters in Diversity Analysis and Microbiome Profiling. FRONTIERS IN BIOINFORMATICS 2022; 2:864597. [PMID: 36304326 PMCID: PMC9580952 DOI: 10.3389/fbinf.2022.864597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
Bacterial diversity is often analyzed using 16S rRNA gene amplicon sequencing. Commonly, sequences are clustered based on similarity cutoffs to obtain groups reflecting molecular species, genera, or families. Due to the amount of the generated sequencing data, greedy algorithms are preferred for their time efficiency. Such algorithms rely only on pairwise sequence similarities. Thus, sometimes sequences with diverse phylogenetic background are clustered together. In contrast, taxonomic classifiers use position specific taxonomic information in assigning a probable taxonomy to a given sequence. Here we introduce Taxonomy Informed Clustering (TIC), a novel approach that utilizes classifier-assigned taxonomy to restrict clustering to only those sequences that share the same taxonomic path. Based on this concept, we offer a complete and automated pipeline for processing of 16S rRNA amplicon datasets in diversity analyses. First, raw reads are processed to form denoised amplicons. Next, the denoised amplicons are taxonomically classified. Finally, the TIC algorithm progressively assigning clusters at molecular species, genus and family levels. TIC outperforms greedy clustering algorithms like USEARCH and VSEARCH in terms of clusters’ purity and entropy, when using data from the Living Tree Project as test samples. Furthermore, we applied TIC on a dataset containing all Bifidobacteriaceae-classified sequences from the IMNGS database. Here, TIC identified evidence for 1000s of novel molecular genera and species. These results highlight the straightforward application of the TIC pipeline and superior results compared to former methods in diversity studies. The pipeline is freely available at: https://github.com/Lagkouvardos/TIC.
Collapse
Affiliation(s)
| | - Mohsen Pourjam
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technical University Munich, Freising, Germany
| | - Klaus Neuhaus
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technical University Munich, Freising, Germany
| | - Ilias Lagkouvardos
- Core Facility Microbiome, ZIEL – Institute for Food & Health, Technical University Munich, Freising, Germany
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Greece
- *Correspondence: Ilias Lagkouvardos,
| |
Collapse
|
4
|
Hong Y, Guo M, Wang J. ENJ algorithm can construct triple phylogenetic trees. MOLECULAR THERAPY-NUCLEIC ACIDS 2020; 23:286-293. [PMID: 33425487 PMCID: PMC7779534 DOI: 10.1016/j.omtn.2020.11.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/05/2020] [Indexed: 11/20/2022]
Abstract
Phylogenetic analysis is used to analyze the evolution of species according to the characteristics of biological sequences. The analytical results are generally represented by phylogenetic trees. NJ (neighbor joining) is a frequently used algorithm for constructing phylogenetic trees because of its few assumptions, fast operation, and high accuracy, and is based on the distance between taxa. It is known that NJ usually constructs different phylogenetic trees for the same dataset with differences in input order, which are known as “tied trees.” This article proposes an improved method of NJ, called ENJ (extended neighbor joining). The ENJ can join several (currently limited to three) nodes with the same minimum distance into a new node, rather than joining two nodes in one iteration, so it can construct triple phylogenetic trees. We have inferred the formulas for updating the distance values and calculating the branch lengths for the ENJ algorithm. We have tested the ENJ with simulated and real data. The experimental results show that, compared with other methods, the trees constructed by the ENJ have greater similarity to the initial trees, and the ENJ is much faster than the NJ algorithm. Moreover, we have constructed a phylogenetic tree for the novel coronavirus (COVID-19) and related coronaviruses by ENJ, which shows that COVID-19 and SARS-CoV are closer than other coronaviruses. Because it differs from the existing phylogenetic trees for those coronaviruses, we constructed a phylogenetic network for them. The network shows those species have had a reticulate evolution.
Collapse
Affiliation(s)
- Yan Hong
- School of Computer Science, Inner Mongolia University, Hohhot 010021, P.R. China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, P.R. China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044, P.R. China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot 010021, P.R. China.,Stage Key Laboratories of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot 010021, Inner Mongolia, P.R. China
| |
Collapse
|
5
|
Hameed M, Liu K, Anwar MN, Wahaab A, Li C, Di D, Wang X, Khan S, Xu J, Li B, Nawaz M, Shao D, Qiu Y, Wei J, Ma Z. A viral metagenomic analysis reveals rich viral abundance and diversity in mosquitoes from pig farms. Transbound Emerg Dis 2019; 67:328-343. [PMID: 31512812 DOI: 10.1111/tbed.13355] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 08/02/2019] [Accepted: 09/03/2019] [Indexed: 12/14/2022]
Abstract
Mosquitoes harbour a diversity of viruses and are responsible for several mosquito-borne viral diseases of humans and animals, thereby leading to major public health concerns, and significant economic losses across the globe. Viral metagenomics offers a great opportunity for bulk analysis of viral genomes retrieved directly from environmental samples. In this study, we performed a viral metagenomic analysis of five pools of mosquitoes belonging to Aedes, Anopheles and Culex species, collected from different pig farms in the vicinity of Shanghai, China, to explore the viral community carried by mosquitoes. The resulting metagenomic data revealed that viral community in the mosquitoes was highly diverse and varied in abundance among pig farms, which comprised of more than 48 viral taxonomic families, specific to vertebrates, invertebrates, plants, fungi, bacteria and protozoa. In addition, a considerable number of viral reads were related to viruses that are not classified by host. The read sequences related to animal viruses included parvoviruses, anelloviruses, circoviruses, flavivirus, rhabdovirus and seadornaviruses, which might be taken up by mosquitoes from viremic animal hosts during blood feeding. Notably, sample G1 contained the most abundant sequence related to Banna virus, which is of public health interest because it causes encephalitis in humans. Furthermore, non-classified viruses also shared considerable virus sequences in all the samples, presumably belonging to unexplored virus category. Overall, the present study provides a comprehensive knowledge of diverse viral populations carried by mosquitoes at pig farms, which is a potential source of diseases for mammals including humans and animals. These viral metagenomic data are valuable for assessment of emerging and re-emerging viral epidemics.
Collapse
Affiliation(s)
- Muddassar Hameed
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Ke Liu
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Muhammad Naveed Anwar
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Abdul Wahaab
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Chenxi Li
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Di Di
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Xin Wang
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Sawar Khan
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Jinpeng Xu
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Beibei Li
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Mohsin Nawaz
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Donghua Shao
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Yafeng Qiu
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Jianchao Wei
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| | - Zhiyong Ma
- Shanghai Veterinary Research Institute, Chinese Academy of Agricultural Science, Shanghai, PR China
| |
Collapse
|
6
|
Xiao P, Han J, Zhang Y, Li C, Guo X, Wen S, Tian M, Li Y, Wang M, Liu H, Ren J, Zhou H, Lu H, Jin N. Metagenomic Analysis of Flaviviridae in Mosquito Viromes Isolated From Yunnan Province in China Reveals Genes From Dengue and Zika Viruses. Front Cell Infect Microbiol 2018; 8:359. [PMID: 30406038 PMCID: PMC6207848 DOI: 10.3389/fcimb.2018.00359] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 09/24/2018] [Indexed: 12/21/2022] Open
Abstract
More than 6,000 mosquitoes of six species from six sites were collected and tested for their virome using metagenomics sequencing and bioinformatic analysis. The identified viral sequences belonged to more than 50 viral families. The results were verified by PCR of selected viruses in all mosquitoes, followed by phylogenetic analysis. In the present study, we identified the partial dengue virus (DENV), Zika virus (ZIKV), and Japanese encephalitis virus (JEV) sequences in mosquitoes. Metagenomic analysis and the PCR amplification revealed three DENV sequences, one of which encodes a partial envelope protein. Two ZIKV sequences both encoding partial nonstructural protein 3 and one JEV sequence encoding the complete envelope protein were identified. There was variability in the viral titers of the newly isolated virus JEV-China/YN2016-1 of different passage viruses. The newly identified Zika virus gene from ZIKV-China/YN2016-1 was an Asian genotype and shared the highest nucleotide sequence identity (97.1%) with a ZIKV sequence from Thailand isolated in 2004. Phylogenetic analysis of ZIKV-China/YN2016-1 and ZIKV-China/YN2016-2 with known Flavivirus genes indicated that ZIKV has propagated in Yunnan province, China.
Collapse
Affiliation(s)
- Pengpeng Xiao
- Yanbian University Medical College, Yanji, China.,Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China
| | - Jicheng Han
- Yanbian University Medical College, Yanji, China.,Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China
| | - Ying Zhang
- College of Veterinary Medicine, College of Animal Science, Jilin University, Changchun, China
| | - Chenghui Li
- Yanbian University Medical College, Yanji, China.,Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China
| | - Xiaofang Guo
- Yunnan Institute of Parasitic Diseases, Simao, China
| | - Shubo Wen
- Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China
| | - Mingyao Tian
- Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China.,Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou, China
| | - Yiquan Li
- Yanbian University Medical College, Yanji, China.,Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China
| | - Maopeng Wang
- Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China.,Institute of Virology, Wenzhou University, Wenzhou, China
| | - Hao Liu
- Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China.,School of Life Sciences and Engineering, Foshan University, Foshan, China
| | - Jingqiang Ren
- Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China.,Division of Economic Animal Epidemic, Institute of Special Economic Animal and Plant Sciences, Changchun, China
| | - Hongning Zhou
- Yunnan Institute of Parasitic Diseases, Simao, China
| | - Huijun Lu
- Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China.,Jiangsu Co-innovation Center for Prevention and Control of Important Animal Infectious Diseases and Zoonoses, Yangzhou, China
| | - Ningyi Jin
- Yanbian University Medical College, Yanji, China.,Institute of Military Veterinary, Academy of Military Medical Sciences, Changchun, China
| |
Collapse
|
7
|
Telles GP, Araújo GS, Walter MEMT, Brigido MM, Almeida NF. Live neighbor-joining. BMC Bioinformatics 2018; 19:172. [PMID: 29769032 PMCID: PMC5956842 DOI: 10.1186/s12859-018-2162-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 04/25/2018] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. RESULTS We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. CONCLUSION Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used.
Collapse
Affiliation(s)
- Guilherme P Telles
- Instituto de Computação, Universidade Estadual de Campinas, Cidade Universitária, Campinas, 13083-852, Brazil
| | - Graziela S Araújo
- Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Av. Costa e Silva, s/n, Campo Grande, 79070-900, Brazil
| | - Maria E M T Walter
- Departamento de Ciência da Computação, Universidade de Brasília, Campus Darcy Ribeiro, Brasília, 70910-900, Brazil
| | - Marcelo M Brigido
- Instituto de Ciências Biológicas, Universidade de Brasília, Campus Darcy Ribeiro, Brasília, 70910-900, Brazil
| | - Nalvo F Almeida
- Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Av. Costa e Silva, s/n, Campo Grande, 79070-900, Brazil.
| |
Collapse
|
8
|
Genome-wide genotyping uncovers genetic profiles and history of the Russian cattle breeds. Heredity (Edinb) 2017; 120:125-137. [PMID: 29217829 DOI: 10.1038/s41437-017-0024-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Revised: 10/09/2017] [Accepted: 10/23/2017] [Indexed: 12/25/2022] Open
Abstract
One of the most economically important areas within the Russian agricultural sector is dairy and beef cattle farming contributing about $11 billion to the Russian economy annually. Trade connections, selection and breeding have resulted in the establishment of a number of breeds that are presumably adapted to local climatic conditions. Little however is known about the ancestry and history of Russian native cattle. To address this question, we genotyped 274 individuals from 18 breeds bred in Russia and compared them to 135 additional breeds from around the world that had been genotyped previously. Our results suggest a shared ancestry between most of the Russian cattle and European taurine breeds, apart from a few breeds that shared ancestry with the Asian taurines. The Yakut cattle, belonging to the latter group, was found to be the most diverged breed in the whole combined dataset according to structure results. Haplotype sharing further suggests that the Russian cattle can be divided into four major clusters reflecting ancestral relations with other breeds. Herein, we therefore shed light on to the history of Russian cattle and identified closely related breeds to those from Russia. Our results will facilitate future research on detecting signatures of selection in cattle genomes and eventually inform future genetics-assisted livestock breeding programs in Russia and in other countries.
Collapse
|
9
|
Feng AJ, Xiao X, Ye CC, Xu XM, Zhu Q, Yuan JP, Hong YH, Wang JH. Isolation and characterization of Burkholderia fungorum Gan-35 with the outstanding ammonia nitrogen-degrading ability from the tailings of rare-earth-element mines in southern Jiangxi, China. AMB Express 2017; 7:140. [PMID: 28655218 PMCID: PMC5484655 DOI: 10.1186/s13568-017-0434-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 06/16/2017] [Indexed: 11/12/2022] Open
Abstract
The exploitation of rare-earth-element (REE) mines has resulted in severe ammonia nitrogen pollution and induced hazards to environments and human health. Screening microorganisms with the ammonia nitrogen-degrading ability provides a basis for bioremediation of ammonia nitrogen-polluted environments. In this study, a bacterium with the outstanding ammonia nitrogen-degrading capability was isolated from the tailings of REE mines in southern Jiangxi Province, China. This strain was identified as Burkholderia fungorum Gan-35 according to phenotypic and phylogenetic analyses. The optimal conditions for ammonia–nitrogen degradation by strain Gan-35 were determined as follows: pH value, 7.5; inoculum dose, 10%; incubation time, 44 h; temperature, 30 °C; and C/N ratio, 15:1. Strain Gan-35 degraded 68.6% of ammonia nitrogen under the optimized conditions. Nepeta cataria grew obviously better in the ammonia nitrogen-polluted soil with strain Gan-35 than that without inoculation, and the decrease in ammonia–nitrogen contents of the former was also more obvious than the latter. Besides, strain Gan-35 exhibited the tolerance to high salinities. In summary, strain Gan-35 harbors the ability of both ammonia–nitrogen degradation at high concentrations and promoting plant growth. This work has reported a Burkholderia strain with the ammonia nitrogen-degrading capability for the first time and is also the first study on the isolation of a bacterium with the ammonia nitrogen-degrading ability from the tailings of REE mines. The results are useful for developing an effective method for microbial remediation of the ammonia nitrogen-polluted tailings of REE mines.
Collapse
|
10
|
Hua GJ, Hung CL, Lin CY, Wu FC, Chan YW, Tang CY. MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL. Evol Bioinform Online 2017; 13:1176934317734220. [PMID: 29051701 PMCID: PMC5637958 DOI: 10.1177/1176934317734220] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 09/06/2017] [Indexed: 11/15/2022] Open
Abstract
A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
Collapse
Affiliation(s)
- Guan-Jie Hua
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Che-Lun Hung
- Big Data Research Center, Department of Computer Science and Communication Engineering, Providence University, Taichung, Taiwan
| | - Chun-Yuan Lin
- Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan
| | - Fu-Che Wu
- Department of Computer Science and Communication Engineering, Providence University, Taichung, Taiwan
| | - Yu-Wei Chan
- College of Computing and Informatics, Providence University, Taichung, Taiwan
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.,Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
| |
Collapse
|