1
|
Chen YA, Ng PY, Garcia-Ruiz D, Elliot A, Palmer B, Assunção Carvalho RMCD, Tseng LF, Lee CS, Tsai KH, Greenhouse B, Chang HH. Genetic surveillance reveals low but sustained malaria transmission with clonal replacement in Sao Tome and Principe. COMMUNICATIONS MEDICINE 2025; 5:199. [PMID: 40425726 PMCID: PMC12116912 DOI: 10.1038/s43856-025-00905-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 05/09/2025] [Indexed: 05/29/2025] Open
Abstract
BACKGROUND Despite efforts to eliminate malaria in Sao Tome and Principe (STP), cases have recently increased. Understanding residual transmission structure is crucial for developing effective elimination strategies. METHODS This study collected surveillance data and generated amplicon sequencing data from 980 samples between 2010 and 2016 to examine the genetic structure of the parasite population. RESULTS Here we show that the mean multiplicity of infection (MOI) is 1.3, with 11% polyclonal infections, indicating low transmission intensity. Temporal trends of these genetic metrics do not align with incidence rates, suggesting that changes in genetic metrics may not straightforwardly reflect changes in transmission intensity, particularly in low transmission settings where genetic drift and importation have a substantial impact. While 88% of samples are genetically linked, continuous turnover in genetic clusters and changes in drug-resistance haplotypes are observed. Principal component analysis reveals some STP samples are genetically similar to those from Central and West Africa, indicating possible importation. CONCLUSIONS These findings highlight the need to prioritize several interventions, such as targeted interventions against transmission hotspots, reactive case detection, and strategies to reduce the introduction of new parasites into this island nation as it approaches elimination. This study also serves as a case study for implementing genetic surveillance in a low transmission setting.
Collapse
Affiliation(s)
- Ying-An Chen
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, CA, USA
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan, ROC
| | - Peng-Yin Ng
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan, ROC
| | - Daniel Garcia-Ruiz
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan, ROC
- Bioinformatics Program, Institute of Statistical Science, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan, ROC
| | - Aaron Elliot
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, CA, USA
| | - Brian Palmer
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, CA, USA
| | | | - Lien-Fen Tseng
- Taiwan Anti-Malarial Advisory Mission, São Tomé, São Tomé and Príncipe
| | - Cheng-Sheng Lee
- Institute of Molecular and Cellular Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan, ROC
| | - Kun-Hsien Tsai
- Taiwan Anti-Malarial Advisory Mission, São Tomé, São Tomé and Príncipe
- Institute of Environmental and Occupational Health Sciences, College of Public Health, National Taiwan University, Taipei, Taiwan, ROC
| | - Bryan Greenhouse
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, CA, USA
| | - Hsiao-Han Chang
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan, ROC.
| |
Collapse
|
2
|
Chen Y, Ng PY, Garcia D, Elliot A, Palmer B, Assunção Carvalho RMCD, Tseng LF, Lee CS, Tsai KH, Greenhouse B, Chang HH. Genetic surveillance reveals low, sustained malaria transmission with clonal replacement in Sao Tome and Principe. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.15.24309968. [PMID: 39072035 PMCID: PMC11275696 DOI: 10.1101/2024.07.15.24309968] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Despite efforts to eliminate malaria in Sao Tome and Principe (STP), cases have recently increased. Understanding residual transmission structure is crucial for developing effective elimination strategies. This study collected surveillance data and generated amplicon sequencing data from 980 samples between 2010 and 2016 to examine the genetic structure of the parasite population. The mean multiplicity of infection (MOI) was 1.3, with 11% polyclonal infections, indicating low transmission intensity. Temporal trends of these genetic metrics did not align with incidence rates, suggesting that changes in genetic metrics may not straightforwardly reflect changes in transmission intensity, particularly in low transmission settings where genetic drift and importation have a substantial impact. While 88% of samples were genetically linked, continuous turnover in genetic clusters and changes in drug-resistance haplotypes were observed. Principal component analysis revealed some STP samples were genetically similar to those from Central and West Africa, indicating possible importation. These findings highlight the need to prioritize several interventions such as targeted interventions against transmission hotspots, reactive case detection, and strategies to reduce the introduction of new parasites into this island nation as it approaches elimination. This study also serves as a case study for implementing genetic surveillance in a low transmission setting.
Collapse
Affiliation(s)
- Ying‑An Chen
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, United States
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan
| | - Peng-Yin Ng
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan
| | - Daniel Garcia
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan
- Bioinformatics Program, Institute of Statistical Science, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
| | - Aaron Elliot
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, United States
| | - Brian Palmer
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, United States
| | | | - Lien-Fen Tseng
- Taiwan Anti-Malarial Advisory Mission, São Tomé, Democratic Republic of São Tomé and Príncipe
| | - Cheng-Sheng Lee
- Institute of Molecular and Cellular Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan
| | - Kun-Hsien Tsai
- Taiwan Anti-Malarial Advisory Mission, São Tomé, Democratic Republic of São Tomé and Príncipe
- Institute of Environmental and Occupational Health Sciences, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Bryan Greenhouse
- EPPIcenter Research Program, Division of HIV, Infectious Diseases and Global Medicine, Department of Medicine, University of California, San Francisco, United States
| | - Hsiao-Han Chang
- Institute of Bioinformatics and Structural Biology, College of Life Sciences and Medicine, National Tsing Hua University, Hsinchu, Taiwan
| |
Collapse
|
3
|
Zhang P, Liu H, Wei Y, Zhai Y, Tian Q, Zou Q. FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets. Bioinformatics 2024; 40:btae014. [PMID: 38200554 PMCID: PMC10809904 DOI: 10.1093/bioinformatics/btae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 12/27/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open
Abstract
MOTIVATION In bioinformatics, multiple sequence alignment (MSA) is a crucial task. However, conventional methods often struggle with aligning ultralong sequences. To address this issue, researchers have designed MSA methods rooted in a vertical division strategy, which segments sequence data for parallel alignment. A prime example of this approach is FMAlign, which utilizes the FM-index to extract common seeds and segment the sequences accordingly. RESULTS FMAlign2 leverages the suffix array to identify maximal exact matches, redefining the approach of FMAlign from searching for global chains to partial chains. By using a vertical division strategy, large-scale problem is deconstructed into manageable tasks, enabling parallel execution of subMSA. Furthermore, sequence-profile alignment and refinement are incorporated to concatenate subsets, yielding the final result seamlessly. Compared to FMAlign, FMAlign2 markedly augments the segmentation of sequences and significantly reduces the time while maintaining accuracy, especially on ultralong datasets. Importantly, FMAlign2 enhances existing MSA methods by conferring the capability to handle sequences reaching billions in length within an acceptable time frame. AVAILABILITY AND IMPLEMENTATION Source code and datasets are available at https://github.com/malabz/FMAlign2 and https://zenodo.org/records/10435770.
Collapse
Affiliation(s)
- Pinglu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, Zhejiang, China
| | - Huan Liu
- School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621010, Sichuan, China
| | - Yanming Wei
- School of Computer Science and Technology, Xidian University, Xi’an 710071, Shaanxi, China
| | - Yixiao Zhai
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, Zhejiang, China
| | - Qinzhong Tian
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, Zhejiang, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, Zhejiang, China
| |
Collapse
|
4
|
Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023; 30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open
Abstract
Processing large data sets has become an essential part of computational genomics. Greatly increased availability of sequence data from multiple sources has fueled breakthroughs in genomics and related fields but has led to computational challenges processing large sequencing experiments. The minimizer sketch is a popular method for sequence sketching that underlies core steps in computational genomics such as read mapping, sequence assembling, k-mer counting, and more. In most applications, minimizer sketches are constructed using one of few classical approaches. More recently, efforts have been put into building minimizer sketches with desirable properties compared with the classical constructions. In this survey, we review the history of the minimizer sketch, the theories developed around the concept, and the plethora of applications taking advantage of such sketches. We aim to provide the readers a comprehensive picture of the research landscape involving minimizer sketches, in anticipation of better fusion of theory and application in the future.
Collapse
Affiliation(s)
- Hongyu Zheng
- Computer Science Department, Princeton University, Princeton, New Jersey, USA
| | - Guillaume Marçais
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
5
|
Wei Y, Zou Q, Tang F, Yu L. WMSA: a novel method for multiple sequence alignment of DNA sequences. Bioinformatics 2022; 38:5019-5025. [PMID: 36179076 DOI: 10.1093/bioinformatics/btac658] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/30/2022] [Accepted: 09/29/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Multiple sequence alignment (MSA) is a fundamental problem in bioinformatics. The quality of alignment will affect downstream analysis. MAFFT has adopted the Fast Fourier Transform method for searching the homologous segments and using them as anchors to divide the sequences, then making alignment only on segments, which can save time and memory without overly reducing the sequence alignment quality. MAFFT becomes slow when the dataset is large. RESULTS We made a software, WMSA, which uses the divide-and-conquer method to split the sequences into clusters, aligns those clusters into profiles with the center star strategy and then makes a progressive profile-profile alignment. The alignment is conducted by the compiled algorithms of MAFFT, K-Band with multithread parallelism. Our method can balance time, space and quality and performs better than MAFFT in test experiments on highly conserved datasets. AVAILABILITY AND IMPLEMENTATION Source code is freely available at https://github.com/malabz/WMSA/, which is implemented in C/C++ and supported on Linux, and datasets are available at https://github.com/malabz/WMSA-dataset. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yanming Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| |
Collapse
|
6
|
Chao J, Tang F, Xu L. Developments in Algorithms for Sequence Alignment: A Review. Biomolecules 2022; 12:biom12040546. [PMID: 35454135 PMCID: PMC9024764 DOI: 10.3390/biom12040546] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 01/27/2023] Open
Abstract
The continuous development of sequencing technologies has enabled researchers to obtain large amounts of biological sequence data, and this has resulted in increasing demands for software that can perform sequence alignment fast and accurately. A number of algorithms and tools for sequence alignment have been designed to meet the various needs of biologists. Here, the ideas that prevail in the research of sequence alignment and some quality estimation methods for multiple sequence alignment tools are summarized.
Collapse
Affiliation(s)
- Jiannan Chao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China;
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China;
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
- Correspondence:
| |
Collapse
|
7
|
Zhang Y, Zhang Q, Zhou J, Zou Q. A survey on the algorithm and development of multiple sequence alignment. Brief Bioinform 2022; 23:6546258. [PMID: 35272347 DOI: 10.1093/bib/bbac069] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/30/2022] [Accepted: 02/09/2022] [Indexed: 12/21/2022] Open
Abstract
Multiple sequence alignment (MSA) is an essential cornerstone in bioinformatics, which can reveal the potential information in biological sequences, such as function, evolution and structure. MSA is widely used in many bioinformatics scenarios, such as phylogenetic analysis, protein analysis and genomic analysis. However, MSA faces new challenges with the gradual increase in sequence scale and the increasing demand for alignment accuracy. Therefore, developing an efficient and accurate strategy for MSA has become one of the research hotspots in bioinformatics. In this work, we mainly summarize the algorithms for MSA and its applications in bioinformatics. To provide a structured and clear perspective, we systematically introduce MSA's knowledge, including background, database, metric and benchmark. Besides, we list the most common applications of MSA in the field of bioinformatics, including database searching, phylogenetic analysis, genomic analysis, metagenomic analysis and protein analysis. Furthermore, we categorize and analyze classical and state-of-the-art algorithms, divided into progressive alignment, iterative algorithm, heuristics, machine learning and divide-and-conquer. Moreover, we also discuss the challenges and opportunities of MSA in bioinformatics. Our work provides a comprehensive survey of MSA applications and their relevant algorithms. It could bring valuable insights for researchers to contribute their knowledge to MSA and relevant studies.
Collapse
Affiliation(s)
- Yongqing Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China.,School of Computer Science and Engineering, University of Electronic Science and Technology of China, 611731, Chengdu, China
| | - Qiang Zhang
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Jiliu Zhou
- School of Computer Science, Chengdu University of Information Technology, 610225, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054, Chengdu, China
| |
Collapse
|
8
|
Biological sequence analysis. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00003-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
9
|
Liu H, Zou Q, Xu Y. A novel fast multiple nucleotide sequence alignment method based on FM-index. Brief Bioinform 2021; 23:6458932. [PMID: 34893794 DOI: 10.1093/bib/bbab519] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 10/19/2021] [Accepted: 11/14/2021] [Indexed: 11/13/2022] Open
Abstract
Multiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign.
Collapse
Affiliation(s)
- Huan Liu
- School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China
| | - Quan Zou
- Institute of basic and Frontier Sciences, University of Electronic Science and Technology of China and Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Yun Xu
- School of Computer Science, University of Science and Technology of China and Key Laboratory on High Performance Computing of Anhui, China
| |
Collapse
|