1
|
Hu G, Zhou T, Zhou P, Yau SST. Novel natural vector with asymmetric covariance for classifying biological sequences. Gene 2025; 962:149532. [PMID: 40367998 DOI: 10.1016/j.gene.2025.149532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2025] [Revised: 04/07/2025] [Accepted: 04/23/2025] [Indexed: 05/16/2025]
Abstract
The genome sequences of organisms form a large and complex landscape, presenting a significant challenge in bioinformatics: how to utilize mathematical tools to describe and analyze this space effectively. The ability to compare relationships between different organisms depends on creating a rational mapping rule that can uniformly encode genome sequences of varying lengths as vectors in a measurable space. This mapping would enable researchers to apply modern mathematical and machine learning techniques to otherwise challenging genomic comparisons. The natural vector method has been proposed as a concise and effective approach to accomplish this. However, its various iterations have certain limitations. In response, we carefully analyze the strengths and weaknesses of these natural vector methods and propose an improved version-an asymmetric covariance natural vector method (ACNV). This new method incorporates k-mer information alongside covariance computations with asymmetric properties between base positions. We tested ACNV on microbial genome sequence datasets, including bacterial, fungal, and viral sequences, evaluating its performance in terms of classification accuracy and convex hull separation. The results demonstrate that ACNV effectively captures sequence characteristics, showcasing its robust sequence representation capabilities and highlighting its elegant geometric properties.
Collapse
Affiliation(s)
- Guoqing Hu
- Beijing Institute of Mathematical Sciences and Applications (BIMSA), 101408, Beijing, China.
| | - Tao Zhou
- Department of Mathematical Sciences, Tsinghua University, 100084, Beijing, China
| | - Piyu Zhou
- Beijing Institute of Mathematical Sciences and Applications (BIMSA), 101408, Beijing, China; State Key Laboratory of Mathematical Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100190, Beijing, China; University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Stephen Shing-Toung Yau
- Beijing Institute of Mathematical Sciences and Applications (BIMSA), 101408, Beijing, China; Department of Mathematical Sciences, Tsinghua University, 100084, Beijing, China.
| |
Collapse
|
2
|
Vargas-Pinilla P, S Oliveira Fam B, Medina Tavares G, Lima T, Landau L, Paré P, de Cássia Aleixo Tostes R, Pissinatti A, Falótico T, Costa-Neto C, Maestri R, Bortolini MC. From molecular variations to behavioral adaptations: Unveiling adaptive epistasis in primate oxytocin system. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2024; 184:e24947. [PMID: 38783700 DOI: 10.1002/ajpa.24947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 04/12/2024] [Accepted: 04/24/2024] [Indexed: 05/25/2024]
Abstract
OBJECTIVE Our primary objective was to investigate the variability of oxytocin (OT) and the GAMEN binding motif within the LNPEP oxytocinase in primates. MATERIALS AND METHODS We sequenced the LNPEP segment encompassing the GAMEN motif in 34 Platyrrhini species, with 21 of them also sequenced for the OT gene. Our dataset was supplemented with primate sequences of LNPEP, OT, and the oxytocin receptor (OTR) sourced from public databases. Evolutionary analysis and coevolution predictions were made followed by the macroevolution analysis of relevant amino acids associated with phenotypic traits, such as mating systems, parental care, and litter size. To account for phylogenetic structure, we utilized two distinct statistical tests. Additionally, we calculated binding energies focusing on the interaction between Callithtrix jacchus VAMEN and Pro8OT. RESULTS We identified two novel motifs (AAMEN and VAMEN), challenging the current knowledge of motif conservation in placental mammals. Coevolution analysis demonstrated a correlation between GAMEN, AAMEN, and VAMEN and their corresponding OTs and OTRs. Callithrix jacchus exhibited a higher binding energy between VAMEN and Pro8OT than orthologous molecules found in humans (GAMEN and Leu8OT). DISCUSSION The coevolution of AAMEN and VAMEN with their corresponding OTs and OTRs suggests a functional relationship that could have contributed to specific reproductive and adaptive behaviors, including paternal care, social monogamy, and twin births, prominent traits in Cebidae species, such as marmosets and tamarins. Our findings underscore the coevolution of taxon-specific amino acids among the three studied molecules, shedding light on the oxytocinergic system as an adaptive epistatic repertoire in primates.
Collapse
Affiliation(s)
- Pedro Vargas-Pinilla
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- Departamento de Farmacologia, Faculdade de Medicina, Universidade de São Paulo, Ribeirão Preto, Brazil
| | - Bibiana S Oliveira Fam
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental (CPE), Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Gustavo Medina Tavares
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Thaynara Lima
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Luane Landau
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
- Department of Biological Sciences, University at Buffalo, Buffalo, New York, USA
| | - Pâmela Paré
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | | | | | - Tiago Falótico
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, Brazil
| | - Cláudio Costa-Neto
- Departamento de Bioquímica e Imunologia, Faculdade de Medicina, Universidade de São Paulo, Ribeirão Preto, Brazil
| | - Renan Maestri
- Laboratório de Ecomorfologia e Macroevolução, Departamento de Ecologia, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| | - Maria Cátira Bortolini
- Laboratory of Human and Molecular Evolution, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
| |
Collapse
|
3
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
4
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
5
|
Robins WP, Mekalanos JJ. Covariance predicts conserved protein residue interactions important for the emergence and continued evolution of SARS-CoV-2 as a human pathogen. PLoS One 2022; 17:e0270276. [PMID: 35895734 PMCID: PMC9328546 DOI: 10.1371/journal.pone.0270276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 06/07/2022] [Indexed: 12/03/2022] Open
Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that likely emerged from animal reservoirs. Differences in nucleotide and protein sequence composition within related β-coronaviruses are often used to better understand CoV evolution, host adaptation, and their emergence as human pathogens. Here we report the comprehensive analysis of amino acid residue changes that have occurred in lineage B β-coronaviruses that show covariance with each other. This analysis revealed patterns of covariance within conserved viral proteins that potentially define conserved interactions within and between core proteins encoded by SARS-CoV-2 related β-coronaviruses. We identified not only individual pairs but also networks of amino acid residues that exhibited statistically high frequencies of covariance with each other using an independent pair model followed by a tandem model approach. Using 149 different CoV genomes that vary in their relatedness, we identified networks of unique combinations of alleles that can be incrementally traced genome by genome within different phylogenic lineages. Remarkably, covariant residues and their respective regions most abundantly represented are implicated in the emergence of SARS-CoV-2 and are also enriched in dominant SARS-CoV-2 variants.
Collapse
Affiliation(s)
- William P. Robins
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - John J. Mekalanos
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
6
|
Robins WP, Mekalanos JJ. Covariance predicts conserved protein residue interactions important to the emergence and continued evolution of SARS-CoV-2 as a human pathogen. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.01.13.476204. [PMID: 35169805 PMCID: PMC8845505 DOI: 10.1101/2022.01.13.476204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that likely emerged from animal reservoirs. Differences in nucleotide and protein sequence composition within related β-coronaviruses are often used to better understand CoV evolution, host adaptation, and their emergence as human pathogens. Here we report the comprehensive analysis of amino acid residue changes that have occurred in lineage B β-coronaviruses that show covariance with each other. This analysis revealed patterns of covariance within conserved viral proteins that potentially define conserved interactions within and between core proteins encoded by SARS-CoV-2 related β-coranaviruses. We identified not only individual pairs but also networks of amino acid residues that exhibited statistically high frequencies of covariance with each other using an independent pair model followed by a tandem model approach. Using 149 different CoV genomes that vary in their relatedness, we identified networks of unique combinations of alleles that can be incrementally traced genome by genome within different phylogenic lineages. Remarkably, covariant residues and their respective regions most abundantly represented are implicated in the emergence of SARS-CoV-2 are also enriched in dominant SARS-CoV-2 variants.
Collapse
Affiliation(s)
- William P Robins
- Department of Microbiology, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115
| | - John J Mekalanos
- Department of Microbiology, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115
| |
Collapse
|
7
|
Landau LJB, Fam BSDO, Yépez Y, Caldas-Garcia GB, Pissinatti A, Falótico T, Reales G, Schüler-Faccini L, Sortica VA, Bortolini MC. Evolutionary analysis of the anti-viral STAT2 gene of primates and rodents: Signature of different stages of an arms race. INFECTION GENETICS AND EVOLUTION 2021; 95:105030. [PMID: 34384937 DOI: 10.1016/j.meegid.2021.105030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/24/2021] [Accepted: 08/06/2021] [Indexed: 02/04/2023]
Abstract
STAT2 plays a strategic role in defending viral infection through the signaling cascade involving the immune system initiated after type I interferon release. Many flaviviruses target the inactivation or degradation of STAT2 as a strategy to impair this host's line of defense. Primates are natural reservoirs for a range of disease-causing flaviviruses (e.g., Zika, Dengue, and Yellow Fever virus), while rodents appear less susceptible. We analyzed the STAT2 coding sequence of 28 Rodentia species and 49 Primates species. Original data from 19 Platyrrhini species were sequenced for the SH2 domain of STAT2 and included in the analysis. STAT2 has many sites whose variation can be explained by positive selection, measurement by two methods (PALM indicated 12, MEME 61). Both evolutionary tests significantly marked sites 127, 731, 739, 766, and 780. SH2 is under evolutionary constraint but presents episodic positive selection events within Rodentia: in one of them, a moderately radical change (serine > arginine) at position 638 is found in Peromyscus species, and can be implicated in the difference in susceptibility to flaviviruses within Rodentia. Some other positively selected sites are functional such as 5, 95, 203, 251, 782, and 829. Sites 251 and 287 regulate the signaling mediated by the JAK-STAT2 pathway, while 782 and 829 create a stable tertiary structure of STAT2, facilitating its connection with transcriptional co-activators. Only three positively selected sites, 5, 95, and 203, are recognized members who act on the interface between STAT2 and flaviviruses NS5 protein. We suggested that due to the higher evolutionary rate, rodents are, at this moment, taking some advantage in the battle against infections for some well-known Flaviviridae, in particular when compared to primates. Our results point to dynamics that fit with a molecular evolutionary scenario shaped by a thought-provoking virus-host arms race.
Collapse
Affiliation(s)
- Luane Jandira Bueno Landau
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Bibiana Sampaio de Oliveira Fam
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Yuri Yépez
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Gabriela Barreto Caldas-Garcia
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Alcides Pissinatti
- Rio de Janeiro's Primatology Center (RJPC - INEA), Rio de Janeiro, RJ, Brazil
| | - Tiago Falótico
- School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, SP, Brazil
| | - Guillermo Reales
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil; Instituto Nacional de Genética Médica Populacional, Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre, Porto Alegre, RS, Brazil
| | - Lavínia Schüler-Faccini
- Instituto Nacional de Genética Médica Populacional, Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre, Porto Alegre, RS, Brazil
| | - Vinicius Albuquerque Sortica
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Maria Cátira Bortolini
- Laboratório de Evolução Humana e Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil.
| |
Collapse
|
8
|
Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y, Xiao Z, Lou Y, Qiu Y, Zhu F. Computational advances of tumor marker selection and sample classification in cancer proteomics. Comput Struct Biotechnol J 2020; 18:2012-2025. [PMID: 32802273 PMCID: PMC7403885 DOI: 10.1016/j.csbj.2020.07.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 12/11/2022] Open
Abstract
Cancer proteomics has become a powerful technique for characterizing the protein markers driving transformation of malignancy, tracing proteome variation triggered by therapeutics, and discovering the novel targets and drugs for the treatment of oncologic diseases. To facilitate cancer diagnosis/prognosis and accelerate drug target discovery, a variety of methods for tumor marker identification and sample classification have been developed and successfully applied to cancer proteomic studies. This review article describes the most recent advances in those various approaches together with their current applications in cancer-related studies. Firstly, a number of popular feature selection methods are overviewed with objective evaluation on their advantages and disadvantages. Secondly, these methods are grouped into three major classes based on their underlying algorithms. Finally, a variety of sample separation algorithms are discussed. This review provides a comprehensive overview of the advances on tumor maker identification and patients/samples/tissues separations, which could be guidance to the researches in cancer proteomics.
Collapse
Key Words
- ANN, Artificial Neural Network
- ANOVA, Analysis of Variance
- CFS, Correlation-based Feature Selection
- Cancer proteomics
- Computational methods
- DAPC, Discriminant Analysis of Principal Component
- DT, Decision Trees
- EDA, Estimation of Distribution Algorithm
- FC, Fold Change
- GA, Genetic Algorithms
- GR, Gain Ratio
- HC, Hill Climbing
- HCA, Hierarchical Cluster Analysis
- IG, Information Gain
- LDA, Linear Discriminant Analysis
- LIMMA, Linear Models for Microarray Data
- MBF, Markov Blanket Filter
- MWW, Mann–Whitney–Wilcoxon test
- OPLS-DA, Orthogonal Partial Least Squares Discriminant Analysis
- PCA, Principal Component Analysis
- PLS-DA, Partial Least Square Discriminant Analysis
- RF, Random Forest
- RF-RFE, Random Forest with Recursive Feature Elimination
- SA, Simulated Annealing
- SAM, Significance Analysis of Microarrays
- SBE, Sequential Backward Elimination
- SFS, and Sequential Forward Selection
- SOM, Self-organizing Map
- SU, Symmetrical Uncertainty
- SVM, Support Vector Machine
- SVM-RFE, Support Vector Machine with Recursive Feature Elimination
- Sample classification
- Tumor marker selection
- sPLSDA, Sparse Partial Least Squares Discriminant Analysis
- t-SNE, Student t Distribution
- χ2, Chi-square
Collapse
Affiliation(s)
- Jing Tang
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jianbo Fu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,School of Pharmaceutical Sciences and Innovative Drug Research Centre, Chongqing University, Chongqing 401331, China
| | - Yi Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziyu Xiao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yan Lou
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Yunqing Qiu
- Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Feng Zhu
- Department of Bioinformatics, Chongqing Medical University, Chongqing 400016, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
9
|
Factors Influencing the Prevalence of Resistance-Associated Substitutions in NS5A Protein in Treatment-Naive Patients with Chronic Hepatitis C. Biomedicines 2020; 8:biomedicines8040080. [PMID: 32272736 PMCID: PMC7235841 DOI: 10.3390/biomedicines8040080] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/24/2020] [Accepted: 04/05/2020] [Indexed: 12/12/2022] Open
Abstract
Direct-acting antivirals (DAAs) revolutionized treatment of hepatitis C virus (HCV) infection. Resistance-associated substitutions (RASs) present at the baseline impair response to DAA due to rapid selection of resistant HCV strains. NS5A is indispensable target of the current DAA treatment regimens. We evaluated prevalence of RASs in NS5A in DAA-naïve patients infected with HCV 1a (n = 19), 1b (n = 93), and 3a (n = 90) before systematic DAA application in the territory of the Russian Federation. Total proportion of strains carrying at least one RAS constituted 35.1% (71/202). In HCV 1a we detected only M28V (57.9%) attributed to a founder effect. Common RASs in HCV 1b were R30Q (7.5%), L31M (5.4%), P58S (4.4%), and Y93H (5.4%); in HCV 3a, A30S (31.0%), A30K (5.7%), S62L (8.9%), and Y93H (2.2%). Prevalence of RASs in NS5A of HCV 1b and 3a was similar to that worldwide, including countries practicing massive DAA application, i.e., it was not related to treatment. NS5A with and without RASs exhibited different co-variance networks, which could be attributed to the necessity to preserve viral fitness. Majority of RASs were localized in polymorphic regions subjected to immune pressure, with selected substitutions allowing immune escape. Altogether, this explains high prevalence of RAS in NS5A and low barrier for their appearance in DAA-inexperienced population.
Collapse
|
10
|
Rodriguez-Sabate C, Morales I, Puertas-Avendaño R, Rodriguez M. The dynamic of basal ganglia activity with a multiple covariance method: influences of Parkinson's disease. Brain Commun 2019; 2:fcz044. [PMID: 32954313 PMCID: PMC7425309 DOI: 10.1093/braincomms/fcz044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 10/31/2019] [Accepted: 11/17/2019] [Indexed: 11/26/2022] Open
Abstract
The closed-loop cortico-subcortical pathways of basal ganglia have been extensively used to describe the physiology of these centres and to justify the functional disorders of basal ganglia diseases. This approach justifies some experimental and clinical data but not others, and furthermore, it does not include a number of subcortical circuits that may produce a more complex basal ganglia dynamic than that expected for closed-loop linear networks. This work studied the functional connectivity of the main regions of the basal ganglia motor circuit with magnetic resonance imaging and a new method (functional profile method), which can analyse the multiple covariant activity of human basal ganglia. The functional profile method identified the most frequent covariant functional status (profiles) of the basal ganglia motor circuit, ordering them according to their relative frequency and identifying the most frequent successions between profiles (profile transitions). The functional profile method classified profiles as input profiles that accept the information coming from other networks, output profiles involved in the output of processed information to other networks and highly interconnected internal profiles that accept transitions from input profiles and send transitions to output profiles. Profile transitions showed a previously unobserved functional dynamic of human basal ganglia, suggesting that the basal ganglia motor circuit may work as a dynamic multiple covariance network. The number of internal profiles and internal transitions showed a striking decrease in patients with Parkinson’s disease, a fact not observed for input and output profiles. This suggests that basal ganglia of patients with Parkinson’s disease respond to requirements coming from other neuronal networks, but because the internal processing of information is drastically weakened, its response will be insufficient and perhaps also self-defeating. These marked effects were found in patients with few motor disorders, suggesting that the functional profile method may be an early procedure to detect the first stages of the Parkinson’s disease when the motor disorders are not very evident. The multiple covariance activity found presents a complementary point of view to the cortico-subcortical closed-loop model of basal ganglia. The functional profile method may be easily applied to other brain networks, and it may provide additional explanations for the clinical manifestations of other basal ganglia disorders.
Collapse
Affiliation(s)
- Clara Rodriguez-Sabate
- Laboratory of Neurobiology and Experimental Neurology, Department of Physiology, Faculty of Medicine, University of La Laguna, Tenerife, Canary Islands 28907, Spain.,Center for Networked Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid 28031, Spain.,Department of Psychiatry, Getafe University Hospital, Madrid 28031, Spain
| | - Ingrid Morales
- Laboratory of Neurobiology and Experimental Neurology, Department of Physiology, Faculty of Medicine, University of La Laguna, Tenerife, Canary Islands 28907, Spain.,Center for Networked Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid 28031, Spain
| | - Ricardo Puertas-Avendaño
- Laboratory of Neurobiology and Experimental Neurology, Department of Physiology, Faculty of Medicine, University of La Laguna, Tenerife, Canary Islands 28907, Spain
| | - Manuel Rodriguez
- Laboratory of Neurobiology and Experimental Neurology, Department of Physiology, Faculty of Medicine, University of La Laguna, Tenerife, Canary Islands 28907, Spain.,Center for Networked Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid 28031, Spain
| |
Collapse
|
11
|
Costanza P, Herzeel C, Verachtert W. A comparison of three programming languages for a full-fledged next-generation sequencing tool. BMC Bioinformatics 2019; 20:301. [PMID: 31159721 PMCID: PMC6547519 DOI: 10.1186/s12859-019-2903-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 05/15/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND elPrep is an established multi-threaded framework for preparing SAM and BAM files in sequencing pipelines. To achieve good performance, its software architecture makes only a single pass through a SAM/BAM file for multiple preparation steps, and keeps sequencing data as much as possible in main memory. Similar to other SAM/BAM tools, management of heap memory is a complex task in elPrep, and it became a serious productivity bottleneck in its original implementation language during recent further development of elPrep. We therefore investigated three alternative programming languages: Go and Java using a concurrent, parallel garbage collector on the one hand, and C++17 using reference counting on the other hand for handling large amounts of heap objects. We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use. RESULTS The Go implementation performs best, yielding the best balance between runtime performance and memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use of the Java runs is significantly higher. The C++17 benchmarks run significantly slower than both Go and Java, while using somewhat more memory than the Go runs. Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case. CONCLUSIONS Based on our benchmark results, we selected Go as our new implementation language for elPrep, and recommend considering Go as a good candidate for developing other bioinformatics tools for processing SAM/BAM data as well.
Collapse
|
12
|
Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One 2016; 11:e0163962. [PMID: 27706213 PMCID: PMC5051824 DOI: 10.1371/journal.pone.0163962] [Citation(s) in RCA: 1616] [Impact Index Per Article: 179.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 09/16/2016] [Indexed: 11/23/2022] Open
Abstract
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
Collapse
Affiliation(s)
- Wei Shen
- Department of Microbiology, College of Basic Medical Sciences, Third Military Medical University, 30# Gaotanyan St., Shapingba District, Chongqing, China
| | - Shuai Le
- Department of Microbiology, College of Basic Medical Sciences, Third Military Medical University, 30# Gaotanyan St., Shapingba District, Chongqing, China
| | - Yan Li
- Medical Research Center, Southwest hospital, Third Military Medical University, 29# Gaotanyan St., Shapingba District, Chongqing, China
- * E-mail: (YL); (FH)
| | - Fuquan Hu
- Department of Microbiology, College of Basic Medical Sciences, Third Military Medical University, 30# Gaotanyan St., Shapingba District, Chongqing, China
- * E-mail: (YL); (FH)
| |
Collapse
|
13
|
Cai Y, Wang N, Wu X, Zheng K, Li Y. Compensatory variances of drug-induced hepatitis B virus YMDD mutations. SPRINGERPLUS 2016; 5:1340. [PMID: 27588233 PMCID: PMC4987753 DOI: 10.1186/s40064-016-3003-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 08/05/2016] [Indexed: 12/22/2022]
Abstract
Although the drug-induced mutations of HBV have been ever documented, the evolutionary mechanism is still obscure. To deeply reveal molecular characters of HBV evolution under the special condition, here we made a comprehensive investigation of the molecular variation of the 3432 wild-type sequences and 439 YMDD variants from HBV genotype A, B, C and D, and evaluated the co-variant patterns and the frequency distribution in the different YMDD mutation types and genotypes, by using the naïve Bayes classification algorithm and the complete induction method based on the comparative sequence analysis. The data showed different compensatory changes followed by the rtM204I/V. Although occurrence of the YMDD mutation itself was not related to the HBV genotypes, the subsequence co-variant patterns were related to the YMDD variant types and HBV genotypes. From the hierarchy view, we clarified that historical mutations, drug-induced mutation and compensatory variances, and displayed an inter-conditioned relationship of amino acid variances during multiple evolutionary processes. This study extends the understanding of the polymorphism and fitness of viral protein.
Collapse
Affiliation(s)
- Ying Cai
- Department of Infectious Diseases, No. 324 Hospital of PLA, Chongqing, 400020 China
| | - Ning Wang
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, 400038 China
| | - Xiaomei Wu
- Department of Infectious Diseases, No. 324 Hospital of PLA, Chongqing, 400020 China
| | - Kai Zheng
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, 400038 China
| | - Yan Li
- Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing, 400038 China ; Department of Microbiology, Third Military Medical University, Chongqing, 400038 China
| |
Collapse
|