1
|
Qiao S, Li X, Olatosi B, Young SD. Utilizing Big Data analytics and electronic health record data in HIV prevention, treatment, and care research: a literature review. AIDS Care 2024; 36:583-603. [PMID: 34260325 DOI: 10.1080/09540121.2021.1948499] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/22/2021] [Indexed: 01/07/2023]
Abstract
Propelled by the transformative power of modern information and communication technologies, digitalization of data, and the increasing affordability of high-performance computing, Big Data science has brought forth revolutionary advancement in many areas of business, industry, health, and medicine. The HIV research and care service community is no exception to the benefits from the availability and utilization of Big Data analytics. Electronic health record (EHR) data (e.g., administrative and billing data, electronic medical records, or other digital records of information pertinent to individual or population health) are an essential source of health and disease outcome data because of the large amount of real-world, comprehensive, and often longitudinal data, which provide a good opportunity for leveraging advanced Big Data analytics in addressing challenges in HIV prevention, treatment, and care. This review focuses on studies that apply Big Data analytics to EHR data with aims to synthesize the HIV-related issues that EHR data studies can tackle, identify challenges in the utilization of EHR data in HIV research and practice, and discuss future needs and directions that can realize the promising potential role of Big Data in ending the HIV epidemic.
Collapse
Affiliation(s)
- Shan Qiao
- South Carolina SmartState Center for Healthcare Quality (CHQ), Columbia, SC, USA
- University of South Carolina Big Data Health Science Center, Columbia, SC, USA
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Xiaoming Li
- South Carolina SmartState Center for Healthcare Quality (CHQ), Columbia, SC, USA
- University of South Carolina Big Data Health Science Center, Columbia, SC, USA
- Department of Health Promotion, Education, and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Bankole Olatosi
- South Carolina SmartState Center for Healthcare Quality (CHQ), Columbia, SC, USA
- University of South Carolina Big Data Health Science Center, Columbia, SC, USA
- Department of Health Services Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Sean D Young
- Department of Emergency Medicine, Department of Informatics, Institute for Prediction Technology, University of California, Irvine, CA, USA
| |
Collapse
|
2
|
Juyal A, Hosseini R, Novikov D, Grinshpon M, Zelikovsky A. Reconstruction of Viral Variants via Monte Carlo Clustering. J Comput Biol 2023; 30:1009-1018. [PMID: 37695837 PMCID: PMC10518690 DOI: 10.1089/cmb.2023.0154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023] Open
Abstract
Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.
Collapse
Affiliation(s)
- Akshay Juyal
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| | - Roya Hosseini
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| | - Daniel Novikov
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| | - Mark Grinshpon
- Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia, USA
| | - Alex Zelikovsky
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
3
|
Abduljalil JM, Elghareib AM, Samir A, Ezat AA, Elfiky AA. How helpful were molecular dynamics simulations in shaping our understanding of SARS-CoV-2 spike protein dynamics? Int J Biol Macromol 2023:125153. [PMID: 37268078 DOI: 10.1016/j.ijbiomac.2023.125153] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/22/2023] [Accepted: 05/27/2023] [Indexed: 06/04/2023]
Abstract
The SARS-CoV-2 spike protein (S) represents an important viral component that is required for successful viral infection in humans owing to its essential role in recognition of and entry to host cells. The spike is also an appealing target for drug designers who develop vaccines and antivirals. This article is important as it summarizes how molecular simulations successfully shaped our understanding of spike conformational behavior and its role in viral infection. MD simulations found that the higher affinity of SARS-CoV-2-S to ACE2 is linked to its unique residues that add extra electrostatic and van der Waal interactions in comparison to the SARS-CoV S. This illustrates the spread potential of the pandemic SARS-CoV-2 relative to the epidemic SARS-CoV. Different mutations at the S-ACE2 interface, which is believed to increase the transmission of the new variants, affected the behavior and binding interactions in different simulations. The contributions of glycans to the opening of S were revealed via simulations. The immune evasion of S was linked to the spatial distribution of glycans. This help the virus to escape the immune system recognition. This article is important as it summarizes how molecular simulations successfully shaped our understanding of spike conformational behavior and its role in viral infection. This will pave the way to us preparing for the next pandemic as the computational tools are tailored to help fight new challenges.
Collapse
Affiliation(s)
- Jameel M Abduljalil
- Department of Biological Sciences, Faculty of Applied Sciences, Thamar University, Dhamar, Yemen; Department of Botany and Microbiology, College of Science, Cairo University, Giza, Egypt
| | - Ahmed M Elghareib
- Department of Biophysics, Faculty of Science, Cairo University, Giza, Egypt
| | - Ahmed Samir
- Department of Biophysics, Faculty of Science, Cairo University, Giza, Egypt
| | - Ahmed A Ezat
- Department of Biophysics, Faculty of Science, Cairo University, Giza, Egypt
| | - Abdo A Elfiky
- Department of Biophysics, Faculty of Science, Cairo University, Giza, Egypt.
| |
Collapse
|
4
|
Luo X, Kang X, Schönhuth A. Strainline: full-length de novo viral haplotype reconstruction from noisy long reads. Genome Biol 2022; 23:29. [PMID: 35057847 PMCID: PMC8771625 DOI: 10.1186/s13059-021-02587-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/17/2021] [Indexed: 12/02/2022] Open
Abstract
Haplotype-resolved de novo assembly of highly diverse virus genomes is critical in prevention, control and treatment of viral diseases. Current methods either can handle only relatively accurate short read data, or collapse haplotype-specific variations into consensus sequence. Here, we present Strainline, a novel approach to assemble viral haplotypes from noisy long reads without a reference genome. Strainline is the first approach to provide strain-resolved, full-length de novo assemblies of viral quasispecies from noisy third-generation sequencing data. Benchmarking on simulated and real datasets of varying complexity and diversity confirm this novelty and demonstrate the superiority of Strainline.
Collapse
|
5
|
Dhar S, Zhang C, Măndoiu II, Bansal MS. TNet: Transmission Network Inference Using Within-Host Strain Diversity and its Application to Geographical Tracking of COVID-19 Spread. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:230-242. [PMID: 34255632 PMCID: PMC8956368 DOI: 10.1109/tcbb.2021.3096455] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 07/03/2021] [Accepted: 07/08/2021] [Indexed: 06/13/2023]
Abstract
The inference of disease transmission networks is an important problem in epidemiology. One popular approach for building transmission networks is to reconstruct a phylogenetic tree using sequences from disease strains sampled from infected hosts and infer transmissions based on this tree. However, most existing phylogenetic approaches for transmission network inference are highly computationally intensive and cannot take within-host strain diversity into account. Here, we introduce a new phylogenetic approach for inferring transmission networks, TNet, that addresses these limitations. TNet uses multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. Furthermore, TNet is highly scalable and able to distinguish between ambiguous and unambiguous transmission inferences. We evaluated TNet on a large collection of 560 simulated transmission networks of various sizes and diverse host, sequence, and transmission characteristics, as well as on 10 real transmission datasets with known transmission histories. Our results show that TNet outperforms two other recently developed methods, phyloscanner and SharpTNI, that also consider within-host strain diversity. We also applied TNet to a large collection of SARS-CoV-2 genomes sampled from infected individuals in many countries around the world, demonstrating how our inference framework can be adapted to accurately infer geographical transmission networks. TNet is freely available from https://compbio.engr.uconn.edu/software/TNet/.
Collapse
Affiliation(s)
- Saurav Dhar
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Chengchen Zhang
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Ion I. Măndoiu
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Mukul S. Bansal
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| |
Collapse
|
6
|
Pham LM, Parlavantzas N, Le HH, Bui QH. Towards a Framework for High-Performance Simulation of Livestock Disease Outbreak: A Case Study of Spread of African Swine Fever in Vietnam. Animals (Basel) 2021; 11:ani11092743. [PMID: 34573709 PMCID: PMC8469528 DOI: 10.3390/ani11092743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 09/10/2021] [Accepted: 09/14/2021] [Indexed: 12/04/2022] Open
Abstract
Simple Summary Disease transmission simulation programs in veterinary epidemiology in general and in simulation of African swine fever in particular are often very diverse and require great computing power. However, such programs often share similar workflows from processing input/output data, performing simulations, or storing data. Our paper proposes a common architectural framework for livestock disease transmission simulation programs in order to both improve simulation performance and reduce the effort of developing new simulation programs. Our framework was evaluated with a simulation program of African swine fever transmission currently raging in Vietnam and some other countries around the world. The results from the evaluation experiments not only demonstrate the effectiveness of the framework in terms of performance but also have practical consulting value for decision makers in Vietnam and for international colleagues. Abstract The spread of disease in livestock is an important research topic of veterinary epidemiology because it provides warnings or advice to organizations responsible for the protection of animal health in particular and public health in general. Disease transmission simulation programs are often deployed with different species, disease types, or epidemiological models, and each research team manages its own set of parameters relevant to their target diseases and concerns, resulting in limited cooperation and reuse of research results. Furthermore, these simulation and decision support tools often require a large amount of computational power, especially for models involving tens of thousands of herds with millions of individuals spread over a large geographical area such as a region or a country. It is a matter of fact that epidemic simulation programs are often heterogeneous, but they often share some common workflows including processing of input data and execution of simulation, as well as storage, analysis, and visualization of results. In this article, we propose a novel architectural framework for simultaneously deploying any epidemic simulation program both on premises and on the cloud to improve performance and scalability. We also conduct some experiments to evaluate the proposed architectural framework on some aspects when applying it to simulate the spread of African swine fever in Vietnam.
Collapse
Affiliation(s)
- Linh Manh Pham
- University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, Hanoi 10000, Vietnam; (H.-H.L.); (Q.H.B.)
- Correspondence:
| | - Nikos Parlavantzas
- Campus Universitaire de Beaulieu, Université de Rennes, Inria, CNRS, IRISA, 35042 Rennes, France;
| | - Huy-Ham Le
- University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, Hanoi 10000, Vietnam; (H.-H.L.); (Q.H.B.)
- Agricultural Genetics Institute, Pham Van Dong, Bac Tu Liem, Hanoi 10000, Vietnam
| | - Quang Hung Bui
- University of Engineering and Technology, Vietnam National University, 144 Xuan Thuy, Cau Giay, Hanoi 10000, Vietnam; (H.-H.L.); (Q.H.B.)
| |
Collapse
|
7
|
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021; 22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open
Abstract
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Collapse
Affiliation(s)
- Mohammed Alser
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Jeremy Rotman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Dhrithi Deshpande
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Victor Xue
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA
- Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA
- Biology Department, Pennsylvania State University, University Park, PA, 16801, USA
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Can Alkan
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
| | - Onur Mutlu
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
8
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
9
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
10
|
Basodi S, Baykal PI, Zelikovsky A, Skums P, Pan Y. Analysis of heterogeneous genomic samples using image normalization and machine learning. BMC Genomics 2020; 21:405. [PMID: 33349236 PMCID: PMC7751093 DOI: 10.1186/s12864-020-6661-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.
Collapse
Affiliation(s)
- Sunitha Basodi
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.,The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 11991, Russia
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| |
Collapse
|
11
|
Skums P, Zelikovsky A, Singh R, Gussler W, Dimitrova Z, Knyazev S, Mandric I, Ramachandran S, Campo D, Jha D, Bunimovich L, Costenbader E, Sexton C, O'Connor S, Xia GL, Khudyakov Y. QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data. Bioinformatics 2018; 34:163-170. [PMID: 29304222 DOI: 10.1093/bioinformatics/btx402] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 06/15/2017] [Indexed: 01/08/2023] Open
Abstract
Motivation Genomic analysis has become one of the major tools for disease outbreak investigations. However, existing computational frameworks for inference of transmission history from viral genomic data often do not consider intra-host diversity of pathogens and heavily rely on additional epidemiological data, such as sampling times and exposure intervals. This impedes genomic analysis of outbreaks of highly mutable viruses associated with chronic infections, such as human immunodeficiency virus and hepatitis C virus, whose transmissions are often carried out through minor intra-host variants, while the additional epidemiological information often is either unavailable or has a limited use. Results The proposed framework QUasispecies Evolution, Network-based Transmission INference (QUENTIN) addresses the above challenges by evolutionary analysis of intra-host viral populations sampled by deep sequencing and Bayesian inference using general properties of social networks relevant to infection dissemination. This method allows inference of transmission direction even without the supporting case-specific epidemiological information, identify transmission clusters and reconstruct transmission history. QUENTIN was validated on experimental and simulated data, and applied to investigate HCV transmission within a community of hosts with high-risk behavior. It is available at https://github.com/skumsp/QUENTIN. Contact pskums@gsu.edu or alexz@cs.gsu.edu or rahul@sfsu.edu or yek0@cdc.gov. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavel Skums
- Department of Computer Science, Georgia State University.,Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| | | | - Rahul Singh
- Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA
| | - Walker Gussler
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| | - Zoya Dimitrova
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University
| | - Igor Mandric
- Department of Computer Science, Georgia State University
| | - Sumathi Ramachandran
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| | - David Campo
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| | - Deeptanshu Jha
- Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA
| | - Leonid Bunimovich
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30313, USA
| | | | - Connie Sexton
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA.,Division of Global HIV and TB, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Siobhan O'Connor
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA.,Division of HIV/AIDS Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Guo-Liang Xia
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| | - Yury Khudyakov
- Centers for Disease Control and Prevention, Division of Viral Hepatitis, Atlanta, GA 30303, USA
| |
Collapse
|
12
|
Mu Y, Kodidela S, Wang Y, Kumar S, Cory TJ. The dawn of precision medicine in HIV: state of the art of pharmacotherapy. Expert Opin Pharmacother 2018; 19:1581-1595. [PMID: 30234392 DOI: 10.1080/14656566.2018.1515916] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
INTRODUCTION Combination antiretroviral therapy (ART) reduces viral load to under the limit of detection, successfully decreasing HIV-related morbidity and mortality. Due to viral mutations, complex drug combinations and different patient response, there is an increasing demand for individualized treatment options for patients. AREAS COVERED This review first summarizes the pharmacokinetic and pharmacodynamic profile of clinical first-line drugs, which serves as guidance for antiretroviral precision medicine. Factors which have influential effects on drug efficacy and thus precision medicine are discussed: patients' pharmacogenetic information, virus mutations, comorbidities, and immune recovery. Furthermore, strategies to improve the application of precision medicine are discussed. EXPERT OPINION Precision medicine for ART requires comprehensive information on the drug, virus, and clinical data from the patients. The clinically available genetic tests are a good starting point. To better apply precision medicine, deeper knowledge of drug concentrations, HIV reservoirs, and efficacy associated genes, such as polymorphisms of drug transporters and metabolizing enzymes, are required. With advanced computer-based prediction systems which integrate more comprehensive information on pharmacokinetics, pharmacodynamics, pharmacogenomics, and the clinically relevant information of the patients, precision medicine will lead to better treatment choices and improved disease outcomes.
Collapse
Affiliation(s)
- Ying Mu
- a Department of Clinical Pharmacy and Translational Science , University of Tennessee Health Science Center College of Pharmacy , Memphis , USA
| | - Sunitha Kodidela
- b Department of Pharmaceutical Science , University of Tennessee Health Science Center College of Pharmacy , Memphis , USA
| | - Yujie Wang
- b Department of Pharmaceutical Science , University of Tennessee Health Science Center College of Pharmacy , Memphis , USA
| | - Santosh Kumar
- b Department of Pharmaceutical Science , University of Tennessee Health Science Center College of Pharmacy , Memphis , USA
| | - Theodore J Cory
- a Department of Clinical Pharmacy and Translational Science , University of Tennessee Health Science Center College of Pharmacy , Memphis , USA
| |
Collapse
|
13
|
Montazeri H, Kuipers J, Kouyos R, Böni J, Yerly S, Klimkait T, Aubert V, Günthard HF, Beerenwinkel N. Large-scale inference of conjunctive Bayesian networks. Bioinformatics 2017; 32:i727-i735. [PMID: 27587695 DOI: 10.1093/bioinformatics/btw459] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
UNLABELLED The continuous time conjunctive Bayesian network (CT-CBN) is a graphical model for analyzing the waiting time process of the accumulation of genetic changes (mutations). CT-CBN models have been successfully used in several biological applications such as HIV drug resistance development and genetic progression of cancer. However, current approaches for parameter estimation and network structure learning of CBNs can only deal with a small number of mutations (<20). Here, we address this limitation by presenting an efficient and accurate approximate inference algorithm using a Monte Carlo expectation-maximization algorithm based on importance sampling. The new method can now be used for a large number of mutations, up to one thousand, an increase by two orders of magnitude. In simulation studies, we present the accuracy as well as the running time efficiency of the new inference method and compare it with a MLE method, expectation-maximization, and discrete time CBN model, i.e. a first-order approximation of the CT-CBN model. We also study the application of the new model on HIV drug resistance datasets for the combination therapy with zidovudine plus lamivudine (AZT + 3TC) as well as under no treatment, both extracted from the Swiss HIV Cohort Study database. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented as an R package available at https://github.com/cbg-ethz/MC-CBN CONTACT: niko.beerenwinkel@bsse.ethz.ch SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Jürg Böni
- Swiss National Center for Retroviruses, Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Sabine Yerly
- Laboratory of Virology, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland
| | - Thomas Klimkait
- Department of Biomedicine-Petersplatz, University of Basel, Basel, Switzerland
| | - Vincent Aubert
- Division of Immunology and Allergy, University Hospital Lausanne, Lausanne, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland Institute of Medical Virology
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
14
|
Gupta SK, Gross R, Dandekar T. An antibiotic target ranking and prioritization pipeline combining sequence, structure and network-based approaches exemplified for Serratia marcescens. Gene 2016; 591:268-278. [PMID: 27425866 DOI: 10.1016/j.gene.2016.07.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 06/26/2016] [Accepted: 07/12/2016] [Indexed: 01/20/2023]
Abstract
We investigate a drug target screening pipeline comparing sequence, structure and network-based criteria for prioritization. Serratia marcescens, an opportunistic pathogen, serves as test case. We rank according to (i) availability of three dimensional structures and lead compounds, (ii) not occurring in man and general sequence conservation information, and (iii) network information on the importance of the protein (conserved protein-protein interactions; metabolism; reported to be an essential gene in other organisms). We identify 45 potential anti-microbial drug targets in S. marcescens with KdsA involved in LPS biosynthesis as top candidate drug target. LpxC and FlgB are further top-ranked targets identified by interactome analysis not suggested before for S. marcescens. Pipeline, targets and complementarity of the three approaches are evaluated by available experimental data and genetic evidence and against other antibiotic screening pipelines. This supports reliable drug target identification and prioritization for infectious agents (bacteria, parasites, fungi) by these bundled complementary criteria.
Collapse
Affiliation(s)
- Shishir K Gupta
- Department of Bioinformatics, Biocenter, Am Hubland, D-97074 Würzburg, Germany; Department of Microbiology, Biocenter, Am Hubland, D-97074 Würzburg, Germany.
| | - Roy Gross
- Department of Microbiology, Biocenter, Am Hubland, D-97074 Würzburg, Germany.
| | - Thomas Dandekar
- Department of Bioinformatics, Biocenter, Am Hubland, D-97074 Würzburg, Germany; EMBL Heidelberg, BioComputing Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
| |
Collapse
|
15
|
Abstract
Antiviral drug resistance is a matter of great clinical importance that, historically, has been investigated mostly from a virological perspective. Although the proximate mechanisms of resistance can be readily uncovered using these methods, larger evolutionary trends often remain elusive. Recent interest by population geneticists in studies of antiviral resistance has spurred new metrics for evaluating mutation and recombination rates, demographic histories of transmission and compartmentalization, and selective forces incurred during viral adaptation to antiviral drug treatment. We present up-to-date summaries on antiviral resistance for a range of drugs and viral types, and review recent advances for studying their evolutionary histories. We conclude that information imparted by demographic and selective histories, as revealed through population genomic inference, is integral to assessing the evolution of antiviral resistance as it pertains to human health.
Collapse
Affiliation(s)
- Kristen K Irwin
- School of Life Sciences, École Polytechnique Fédéral de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Nicholas Renzette
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| | - Timothy F Kowalik
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| | - Jeffrey D Jensen
- School of Life Sciences, École Polytechnique Fédéral de Lausanne (EPFL), Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
16
|
Illingworth CJR. Fitness Inference from Short-Read Data: Within-Host Evolution of a Reassortant H5N1 Influenza Virus. Mol Biol Evol 2015; 32:3012-26. [PMID: 26243288 PMCID: PMC4651230 DOI: 10.1093/molbev/msv171] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
We present a method to infer the role of selection acting during the within-host evolution of the influenza virus from short-read genome sequence data. Linkage disequilibrium between loci is accounted for by treating short-read sequences as noisy multilocus emissions from an underlying model of haplotype evolution. A hierarchical model-selection procedure is used to infer the underlying fitness landscape of the virus insofar as that landscape is explored by the viral population. In a first application of our method, we analyze data from an evolutionary experiment describing the growth of a reassortant H5N1 virus in ferrets. Across two sets of replica experiments we infer multiple alleles to be under selection, including variants associated with receptor binding specificity, glycosylation, and with the increased transmissibility of the virus. We identify epistasis as an important component of the within-host fitness landscape, and show that adaptation can proceed through multiple genetic pathways.
Collapse
|
17
|
Montazeri H, Günthard HF, Yang WL, Kouyos R, Beerenwinkel N. Estimating the dynamics and dependencies of accumulating mutations with applications to HIV drug resistance. Biostatistics 2015; 16:713-26. [PMID: 25979750 DOI: 10.1093/biostatistics/kxv019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
We introduce a new model called the observed time conjunctive Bayesian network (OT-CBN) that describes the accumulation of genetic events (mutations) under partial temporal ordering constraints. Unlike other CBN models, the OT-CBN model uses sampling time points of genotypes in addition to genotypes themselves to estimate model parameters. We developed an expectation-maximization algorithm to obtain approximate maximum likelihood estimates by accounting for this additional information. In a simulation study, we show that the OT-CBN model outperforms the continuous time CBN (CT-CBN) (Beerenwinkel and Sullivant, 2009. Markov models for accumulating mutations. Biometrika 96: (3), 645-661), which does not take into account individual sampling times for parameter estimation. We also show superiority of the OT-CBN model on several datasets of HIV drug resistance mutations extracted from the Swiss HIV Cohort Study database.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland and SIB Swiss Institute of Bioinformatics, Basel 4058, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Wan-Lin Yang
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich 8091, Switzerland Institute of Medical Virology, University of Zurich, Zurich 8057, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
18
|
Christley S, Cockrell C, An G. Computational Studies of the Intestinal Host-Microbiota Interactome. COMPUTATION (BASEL, SWITZERLAND) 2015; 3:2-28. [PMID: 34765258 PMCID: PMC8580329 DOI: 10.3390/computation3010002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A large and growing body of research implicates aberrant immune response and compositional shifts of the intestinal microbiota in the pathogenesis of many intestinal disorders. The molecular and physical interaction between the host and the microbiota, known as the host-microbiota interactome, is one of the key drivers in the pathophysiology of many of these disorders. This host-microbiota interactome is a set of dynamic and complex processes, and needs to be treated as a distinct entity and subject for study. Disentangling this complex web of interactions will require novel approaches, using a combination of data-driven bioinformatics with knowledge-driven computational modeling. This review describes the computational approaches for investigating the host-microbiota interactome, with emphasis on the human intestinal tract and innate immunity, and highlights open challenges and existing gaps in the computation methodology for advancing our knowledge about this important facet of human health.
Collapse
Affiliation(s)
- Scott Christley
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA
| | - Chase Cockrell
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA
| | - Gary An
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA
| |
Collapse
|
19
|
Decoding the EGFR mutation-induced drug resistance in lung cancer treatment by local surface geometric properties. Comput Biol Med 2014; 63:293-300. [PMID: 25035232 DOI: 10.1016/j.compbiomed.2014.06.016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 06/03/2014] [Accepted: 06/23/2014] [Indexed: 11/21/2022]
Abstract
Epidermal growth factor receptor (EGFR) mutation-induced drug resistance leads to a limited efficacy of tyrosine kinase inhibitors during lung cancer treatments. In this study, we explore the correlations between the local surface geometric properties of EGFR mutants and the progression-free survival (PFS). The geometric properties include local surface changes (four types) of the EGFR mutants compared with the wild-type EGFR, and the convex degrees of these local surfaces. Our analysis results show that the Spearman׳s rank correlation coefficients between the PFS and three types of local surface properties are all greater than 0.6 with small P-values, implying a high significance. Moreover, the number of atoms with solid angles in the ranges of [0.71, 1], [0.61, 1] or [0.5, 1], indicating the convex degree of a local EGFR surface, also shows a strong correlation with the PFS. Overall, these characteristics can be efficiently applied to the prediction of drug resistance in lung cancer treatments, and easily extended to other cancer treatments.
Collapse
|
20
|
Abstract
The emergence of drug resistance remains one of the most challenging issues in the treatment of HIV-1 infection. The extreme replication dynamics of HIV facilitates its escape from the selective pressure exerted by the human immune system and by the applied combination drug therapy. This article reviews computational methods whose combined use can support the design of optimal antiretroviral therapies based on viral genotypic and phenotypic data. Genotypic assays are based on the analysis of mutations associated with reduced drug susceptibility, but are difficult to interpret due to the numerous mutations and mutational patterns that confer drug resistance. Phenotypic resistance or susceptibility can be experimentally evaluated by measuring the inhibition of the viral replication in cell culture assays. However, this procedure is expensive and time consuming.
Collapse
Affiliation(s)
- Frank Cordes
- Division Scientific Computing, Department Numerical Analysis & Modeling, Konrad-Zuse-Zentrum, Takustr. 7, D-14195 Berlin-Dahlem, Germany.
| | | | | |
Collapse
|
21
|
Aiamkitsumrit B, Dampier W, Antell G, Rivera N, Martin-Garcia J, Pirrone V, Nonnemacher MR, Wigdahl B. Bioinformatic analysis of HIV-1 entry and pathogenesis. Curr HIV Res 2014; 12:132-61. [PMID: 24862329 PMCID: PMC4382797 DOI: 10.2174/1570162x12666140526121746] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Revised: 03/18/2014] [Accepted: 05/06/2014] [Indexed: 02/07/2023]
Abstract
The evolution of human immunodeficiency virus type 1 (HIV-1) with respect to co-receptor utilization has been shown to be relevant to HIV-1 pathogenesis and disease. The CCR5-utilizing (R5) virus has been shown to be important in the very early stages of transmission and highly prevalent during asymptomatic infection and chronic disease. In addition, the R5 virus has been proposed to be involved in neuroinvasion and central nervous system (CNS) disease. In contrast, the CXCR4-utilizing (X4) virus is more prevalent during the course of disease progression and concurrent with the loss of CD4(+) T cells. The dual-tropic virus is able to utilize both co-receptors (CXCR4 and CCR5) and has been thought to represent an intermediate transitional virus that possesses properties of both X4 and R5 viruses that can be encountered at many stages of disease. The use of computational tools and bioinformatic approaches in the prediction of HIV-1 co-receptor usage has been growing in importance with respect to understanding HIV-1 pathogenesis and disease, developing diagnostic tools, and improving the efficacy of therapeutic strategies focused on blocking viral entry. Current strategies have enhanced the sensitivity, specificity, and reproducibility relative to the prediction of co-receptor use; however, these technologies need to be improved with respect to their efficient and accurate use across the HIV-1 subtypes. The most effective approach may center on the combined use of different algorithms involving sequences within and outside of the env-V3 loop. This review focuses on the HIV-1 entry process and on co-receptor utilization, including bioinformatic tools utilized in the prediction of co-receptor usage. It also provides novel preliminary analyses for enabling identification of linkages between amino acids in V3 with other components of the HIV-1 genome and demonstrates that these linkages are different between X4 and R5 viruses.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Brian Wigdahl
- Department of Microbiology and Immunology, Drexel University College of Medicine, 245 N. 15th Street, Philadelphia, PA 19102.
| |
Collapse
|
22
|
Honarparvar B, Govender T, Maguire GEM, Soliman MES, Kruger HG. Integrated Approach to Structure-Based Enzymatic Drug Design: Molecular Modeling, Spectroscopy, and Experimental Bioactivity. Chem Rev 2013; 114:493-537. [DOI: 10.1021/cr300314q] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Bahareh Honarparvar
- Catalysis
and Peptide Research Unit and ‡School of Health Sciences, University of KwaZulu Natal, Durban 4001, South Africa
| | - Thavendran Govender
- Catalysis
and Peptide Research Unit and ‡School of Health Sciences, University of KwaZulu Natal, Durban 4001, South Africa
| | - Glenn E. M. Maguire
- Catalysis
and Peptide Research Unit and ‡School of Health Sciences, University of KwaZulu Natal, Durban 4001, South Africa
| | - Mahmoud E. S. Soliman
- Catalysis
and Peptide Research Unit and ‡School of Health Sciences, University of KwaZulu Natal, Durban 4001, South Africa
| | - Hendrik G. Kruger
- Catalysis
and Peptide Research Unit and ‡School of Health Sciences, University of KwaZulu Natal, Durban 4001, South Africa
| |
Collapse
|
23
|
Beerenwinkel N, Montazeri H, Schuhmacher H, Knupfer P, von Wyl V, Furrer H, Battegay M, Hirschel B, Cavassini M, Vernazza P, Bernasconi E, Yerly S, Böni J, Klimkait T, Cellerai C, Günthard HF. The individualized genetic barrier predicts treatment response in a large cohort of HIV-1 infected patients. PLoS Comput Biol 2013; 9:e1003203. [PMID: 24009493 PMCID: PMC3757085 DOI: 10.1371/journal.pcbi.1003203] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Accepted: 07/14/2013] [Indexed: 12/12/2022] Open
Abstract
The success of combination antiretroviral therapy is limited by the evolutionary escape dynamics of HIV-1. We used Isotonic Conjunctive Bayesian Networks (I-CBNs), a class of probabilistic graphical models, to describe this process. We employed partial order constraints among viral resistance mutations, which give rise to a limited set of mutational pathways, and we modeled phenotypic drug resistance as monotonically increasing along any escape pathway. Using this model, the individualized genetic barrier (IGB) to each drug is derived as the probability of the virus not acquiring additional mutations that confer resistance. Drug-specific IGBs were combined to obtain the IGB to an entire regimen, which quantifies the virus' genetic potential for developing drug resistance under combination therapy. The IGB was tested as a predictor of therapeutic outcome using between 2,185 and 2,631 treatment change episodes of subtype B infected patients from the Swiss HIV Cohort Study Database, a large observational cohort. Using logistic regression, significant univariate predictors included most of the 18 drugs and single-drug IGBs, the IGB to the entire regimen, the expert rules-based genotypic susceptibility score (GSS), several individual mutations, and the peak viral load before treatment change. In the multivariate analysis, the only genotype-derived variables that remained significantly associated with virological success were GSS and, with 10-fold stronger association, IGB to regimen. When predicting suppression of viral load below 400 cps/ml, IGB outperformed GSS and also improved GSS-containing predictors significantly, but the difference was not significant for suppression below 50 cps/ml. Thus, the IGB to regimen is a novel data-derived predictor of treatment outcome that has potential to improve the interpretation of genotypic drug resistance tests.
Collapse
Affiliation(s)
- Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Ni Z, Chen H, Qi X, Jin R. Why is Substrate Peptide Binding Unsusceptible to Multidrug-Resistant Mutations in HIV-1 Protease? A Structural and Energetic Analysis. Int J Pept Res Ther 2013. [DOI: 10.1007/s10989-013-9365-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
25
|
Ding B, Li N, Wang W. Characterizing Binding of Small Molecules. II. Evaluating the Potency of Small Molecules to Combat Resistance Based on Docking Structures. J Chem Inf Model 2013; 53:1213-22. [DOI: 10.1021/ci400011c] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Bo Ding
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| | - Nan Li
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| | - Wei Wang
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| |
Collapse
|
26
|
Contrasting life strategies of viruses that infect photo- and heterotrophic bacteria, as revealed by viral tagging. mBio 2012; 3:mBio.00373-12. [PMID: 23111870 PMCID: PMC3487772 DOI: 10.1128/mbio.00373-12] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Ocean viruses are ubiquitous and abundant and play important roles in global biogeochemical cycles by means of their mortality, horizontal gene transfer, and manipulation of host metabolism. However, the obstacles involved in linking viruses to their hosts in a high-throughput manner bottlenecks our ability to understand virus-host interactions in complex communities. We have developed a method called viral tagging (VT), which combines mixtures of host cells and fluorescent viruses with flow cytometry. We investigated multiple viruses which infect each of two model marine bacteria that represent the slow-growing, photoautotrophic genus Synechococcus (Cyanobacteria) and the fast-growing, heterotrophic genus Pseudoalteromonas (Gammaproteobacteria). Overall, viral tagging results for viral infection were consistent with plaque and liquid infection assays for cyanobacterial myo-, podo- and siphoviruses and some (myo- and podoviruses) but not all (four siphoviruses) heterotrophic bacterial viruses. Virus-tagged Pseudoalteromonas organisms were proportional to the added viruses under varied infection conditions (virus-bacterium ratios), while no more than 50% of the Synechococcus organisms were virus tagged even at viral abundances that exceeded (5 to 10×) that of their hosts. Further, we found that host growth phase minimally impacts the fraction of virus-tagged Synechococcus organisms while greatly affecting phage adsorption to Pseudoalteromonas. Together these findings suggest that at least two contrasting viral life strategies exist in the oceans and that they likely reflect adaptation to their host microbes. Looking forward to the point at which the virus-tagging signature is well understood (e.g., for Synechococcus), application to natural communities should begin to provide population genomic data at the proper scale for predictively modeling two of the most abundant biological entities on Earth. Viral study suffers from an inability to link viruses to hosts en masse, and yet delineating “who infects whom” is fundamental to viral ecology and predictive modeling. This article describes viral tagging—a high-throughput method to investigate virus-host interactions by combining the fluorescent labeling of viruses for “tagging” host cells that can be analyzed and sorted using flow cytometry. Two cultivated hosts (the cyanobacterium Synechococcus and the gammaproteobacterium Pseudoalteromonas) and their viruses (podo-, myo-, and siphoviruses) were investigated to validate the method. These lab-based experiments indicate that for most virus-host pairings, VT (viral tagging) adsorption is equivalent to traditional infection by liquid and plaque assays, with the exceptions being confined to promiscuous adsorption by Pseudoalteromonas siphoviruses. These experiments also reveal variability in life strategies across these oceanic virus-host systems with respect to infection conditions and host growth status, which highlights the need for further model system characterization to break open this virus-host interaction “black box.”
Collapse
|
27
|
Abstract
Drug resistance is a common cause of treatment failure for HIV infection and cancer. The high mutation rate of HIV leads to genetic heterogeneity among viral populations and provides the seed from which drug-resistant clones emerge in response to therapy. Similarly, most cancers are characterized by extensive genetic, epigenetic, transcriptional and cellular diversity, and drug-resistant cancer cells outgrow their non-resistant peers in a process of somatic evolution. Patient-specific combination of antiviral drugs has emerged as a powerful approach for treating drug-resistant HIV infection, using genotype-based predictions to identify the best matched combination therapy among several hundred possible combinations of HIV drugs. In this Opinion article, we argue that HIV therapy provides a 'blueprint' for designing and validating patient-specific combination therapies in cancer.
Collapse
Affiliation(s)
- Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria.
| | | |
Collapse
|
28
|
Doherty KM, Nakka P, King BM, Rhee SY, Holmes SP, Shafer RW, Radhakrishnan ML. A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes. BMC Bioinformatics 2011; 12:477. [PMID: 22172090 PMCID: PMC3305535 DOI: 10.1186/1471-2105-12-477] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2011] [Accepted: 12/15/2011] [Indexed: 12/19/2022] Open
Abstract
Background Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to arise within the population. Understanding the phenotypic and genotypic patterns responsible for multi-PI resistance is necessary for developing PIs that are active against clinically-relevant PI-resistant HIV-1 variants. Results In this work, we use globally optimal integer programming-based clustering techniques to elucidate multi-PI phenotypic resistance patterns using a data set of 398 HIV-1 protease sequences that have each been phenotyped for susceptibility toward the nine clinically-approved HIV-1 PIs. We validate the information content of the clusters by evaluating their ability to predict the level of decreased susceptibility to each of the available PIs using a cross validation procedure. We demonstrate the finding that as a result of phenotypic cross resistance, the considered clinical HIV-1 protease isolates are confined to ~6% or less of the clinically-relevant phenotypic space. Clustering and feature selection methods are used to find representative sequences and mutations for major resistance phenotypes to elucidate their genotypic signatures. We show that phenotypic similarity does not imply genotypic similarity, that different PI-resistance mutation patterns can give rise to HIV-1 isolates with similar phenotypic profiles. Conclusion Rather than characterizing HIV-1 susceptibility toward each PI individually, our study offers a unique perspective on the phenomenon of PI class resistance by uncovering major multidrug-resistant phenotypic patterns and their often diverse genotypic determinants, providing a methodology that can be applied to understand clinically-relevant phenotypic patterns to aid in the design of novel inhibitors that target other rapidly evolving molecular targets as well.
Collapse
|
29
|
Visseaux B, Hurtado-Nedelec M, Charpentier C, Collin G, Storto A, Matheron S, Larrouy L, Damond F, Brun-Vézinet F, Descamps D. Molecular determinants of HIV-2 R5-X4 tropism in the V3 loop: development of a new genotypic tool. J Infect Dis 2011; 205:111-20. [PMID: 22140264 DOI: 10.1093/infdis/jir698] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE The use of CCR5 inhibitors requires a tool to predict human immunodeficiency virus type 2 (HIV-2) tropism, as established in HIV-1. The aim of our study was to identify genotypic determinants of HIV-2 tropism located in the gp105 V3 loop. METHODS HIV-2 tropism phenotypic assays were performed on 53 HIV-2 clinical isolates using GFP expressing human osteosarcoma T4 [GHOST(3)] cell lines expressing CD4 and CCR5 or CXCR4 coreceptors. The gp105 V3 loop was sequenced and analyzed. RESULTS Thirty-four HIV-2 isolates were classified as R5, 7 as X4, and 12 as X4/R5 (dual). Substitution at residue 18 was always associated with a dual/X4 tropism (P < .00001). The following determinants were associated with dual/X4 tropism: a global net charge of more than +6 (P < .00001), V19K/R mutation (P < .00001), S22A/F/Y mutation (P < .002), Q23R mutation (P < .00001), and insertions at residue 24 (P < .00001), I25L/Y (P < .0004), R28K (P < .0004), and R30K (P < .014). These mutations were not found in R5 isolates, except R28K and R30K, which were detected in 4 and 5 R5 isolates, respectively. The 4 major genotypic determinants of dual/X4 tropism were mutation at residue 18, V19 K/R mutation, insertions at residue 24, and V3 global net charge. CONCLUSIONS We established a strong association between HIV-2 phenotypic tropism and V3-loop sequences, allowing for the prediction of R5- and/or X4-tropic viruses in HIV-2 infection.
Collapse
Affiliation(s)
- Benoit Visseaux
- Laboratoire de Virologie, Assistance Publique-Hôpitaux de Paris, Hôpital Bichat-Claude Bernard, France
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
[Bioinformatics studies on drug resistance against anti-HIV-1 drugs]. Uirusu 2011; 61:35-47. [PMID: 21972554 DOI: 10.2222/jsv.61.35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
More than 20 drugs have been available for anti-HIV-1 treatment in Japan. Combination therapy with these drugs dramatically decreases in morbidity and mortality of AIDS. However, due to high mutation rate of HIV-1, treatment with ineffective drugs toward patients infected with HIV-1 causes accumulation of mutations in the virus, and emergence of drug resistant viruses. Thus, to achieve appropriate application of the drugs toward the respective patients living with HIV-1, methods for predicting the level of drug-resistance using viral sequence information has been developed on the basis of bioinformatics. Furthermore, ultra-deep sequencing by next-generation sequencer whose data analysis is also based on bioinformatics, or in silico structural modeling have been achieved to understand drug resistant mechanisms. In this review, I overview the bioinformatics studies about drug resistance against anti-HIV-1 drugs.
Collapse
|
31
|
Alcaro S, Alteri C, Artese A, Ceccherini-Silberstein F, Costa G, Ortuso F, Bertoli A, Forbici F, Santoro MM, Parrotta L, Flandre P, Masquelier B, Descamps D, Calvez V, Marcelin AG, Perno CF, Sing T, Svicher V. Docking analysis and resistance evaluation of clinically relevant mutations associated with the HIV-1 non-nucleoside reverse transcriptase inhibitors nevirapine, efavirenz and etravirine. ChemMedChem 2011; 6:2203-13. [PMID: 21953939 DOI: 10.1002/cmdc.201100362] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Indexed: 11/07/2022]
Abstract
An integrated computational and statistical approach was used to determine the association of non-nucleoside reverse transcriptase inhibitors (NNRTIs) nevirapine, efavirenz and etravirine with resistance mutations that cause therapeutic failure and their impact on NNRTI resistance. Mutations detected for nevirapine virological failure with a prevalence greater than 10% in the used patient set were: K103N, Y181C, G190A, and K101E. A support vector regression model, based on matched genotypic/phenotypic data (n=850), showed that among 6365 analyzed mutations, K103N, Y181C and G190A have the first, third, and sixth greatest significance for nevirapine resistance, respectively. The most common indicator of treatment failure for efavirenz was K103N mutation present in 56.7% of the patients where the drug failed, followed by V108I, L100I, and G190A. For efavirenz resistance, K103N, G190, and L100I have the first, fourth, and eighth greatest significance, respectively, as determined in support vector regression model. No positive interactions were observed among nevirapine resistance mutations, while a more complex situation was observed with treatment failure of efavirenz and etravirine, characterized by the accumulation of multiple mutations. Docking simulations and free energy analysis based on docking scores of mutated human immunodeficiency virus (HIV) RT complexes were used to evaluate the influence of selected mutations on drug recognition. Results from support vector regression were confirmed by docking analysis. In particular, for nevirapine and efavirenz, a single mutation K103N was associated with the most unfavorable energetic profile compared to the wild-type sequence. This is in line with recent clinical data reporting that diarylpyrimidine etravirine, a very potent third generation drug effective against a wide range of drug-resistant HIV-1 variants, shows increased affinity towards K103N/S mutants due to its high conformational flexibility.
Collapse
Affiliation(s)
- Stefano Alcaro
- Dipartimento di Scienze Farmacobiologiche, Università degli Studi Magna Graecia di Catanzaro, Complesso Ninì Barbieri, 88021 Roccelletta di Borgia (CZ), Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
The development of an expert system to predict virological response to HIV therapy as part of an online treatment support tool. AIDS 2011; 25:1855-63. [PMID: 21785323 DOI: 10.1097/qad.0b013e328349a9c2] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVE The optimum selection and sequencing of combination antiretroviral therapy to maintain viral suppression can be challenging. The HIV Resistance Response Database Initiative has pioneered the development of computational models that predict the virological response to drug combinations. Here we describe the development and testing of random forest models to power an online treatment selection tool. METHODS Five thousand, seven hundred and fifty-two treatment change episodes were selected to train a committee of 10 models to predict the probability of virological response to a new regimen. The input variables were antiretroviral treatment history, baseline CD4 cell count, viral load and genotype, drugs in the new regimen, time from treatment change to follow-up and follow-up viral load values. The models were assessed during cross-validation and with an independent set of 50 treatment change episodes by plotting receiver-operator characteristic curves and their performance compared with genotypic sensitivity scores from rules-based genotype interpretation systems. RESULTS The models achieved an area under the curve during cross-validation of 0.77-0.87 (mean = 0.82), accuracy of 72-81% (mean = 77%), sensitivity of 62-80% (mean = 67%) and specificity of 75-89% (mean = 81%). When tested with the 50 test cases, the area under the curve was 0.70-0.88, accuracy 64-82%, sensitivity 62-80% and specificity 68-95%. The genotypic sensitivity scores achieved an area under the curve of 0.51-0.52, overall accuracy of 54-56%, sensitivity of 43-64% and specificity of 41-73%. CONCLUSION The models achieved a consistent, high level of accuracy in predicting treatment responses, which was markedly superior to that of genotypic sensitivity scores. The models are being used to power an experimental system now available via the Internet.
Collapse
|
33
|
Astrovskaya I, Tork B, Mangul S, Westbrooks K, Măndoiu I, Balfe P, Zelikovsky A. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 2011; 12 Suppl 6:S1. [PMID: 21989211 PMCID: PMC3194189 DOI: 10.1186/1471-2105-12-s6-s1] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. RESULTS In this paper, we introduce a new Viral Spectrum Assembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at http://alla.cs.gsu.edu/~software/VISPA/vispa.html. CONCLUSIONS ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.
Collapse
Affiliation(s)
- Irina Astrovskaya
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| | - Bassam Tork
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| | - Serghei Mangul
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| | | | - Ion Măndoiu
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Peter Balfe
- Institute of Biomedical Research, Birmingham University, Birmingham B15 2TT, UK
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| |
Collapse
|
34
|
European guidelines on the clinical management of HIV-1 tropism testing. THE LANCET. INFECTIOUS DISEASES 2011; 11:394-407. [PMID: 21429803 DOI: 10.1016/s1473-3099(10)70319-4] [Citation(s) in RCA: 196] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
35
|
Larder BA, Revell A, Mican JM, Agan BK, Harris M, Torti C, Izzo I, Metcalf JA, Rivera-Goba M, Marconi VC, Wang D, Coe D, Gazzard B, Montaner J, Lane HC. Clinical evaluation of the potential utility of computational modeling as an HIV treatment selection tool by physicians with considerable HIV experience. AIDS Patient Care STDS 2011; 25:29-36. [PMID: 21214377 PMCID: PMC3030912 DOI: 10.1089/apc.2010.0254] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The HIV Resistance Response Database Initiative (RDI), which comprises a small research team in the United Kingdom and collaborating clinical centers in more than 15 countries, has used antiretroviral treatment and response data from thousands of patients around the world to develop computational models that are highly predictive of virologic response. The potential utility of such models as a tool for assisting treatment selection was assessed in two clinical pilot studies: a prospective study in Canada and Italy, which was terminated early because of the availability of new drugs not covered by the system, and a retrospective study in the United States. For these studies, a Web-based user interface was constructed to provide access to the models. Participating physicians entered baseline data for cases of treatment failure and then registered their treatment intention. They then received a report listing the five alternative regimens that the models predicted would be most effective plus their own selection, ranked in order of predicted virologic response. The physicians then entered their final treatment decision. Twenty-three physicians entered 114 cases (75 unique cases with 39 entered twice by different physicians). Overall, 33% of treatment decisions were changed following review of the report. The final treatment decisions and the best of the RDI alternatives were predicted to produce greater virologic responses and involve fewer drugs than the original selections. Most physicians found the system easy to use and understand. All but one indicated they would use the system if it were available, particularly for highly treatment-experienced cases with challenging resistance profiles. Despite limitations, the first clinical evaluation of this approach by physicians with substantial HIV-experience suggests that it has the potential to deliver clinical and economic benefits.
Collapse
Affiliation(s)
- Brendan A. Larder
- The HIV Resistance Response Database Initiative (RDI), London, United Kingdom
| | - Andrew Revell
- The HIV Resistance Response Database Initiative (RDI), London, United Kingdom
| | - JoAnn M. Mican
- National Institutes of Allergy and Infectious Diseases, Bethesda, Maryland
| | - Brian K. Agan
- Infectious Disease Clinical Research Program, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | | | | | | | - Julia A. Metcalf
- National Institutes of Allergy and Infectious Diseases, Bethesda, Maryland
| | | | - Vincent C. Marconi
- Infectious Disease Clinical Research Program, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Dechao Wang
- The HIV Resistance Response Database Initiative (RDI), London, United Kingdom
| | - Daniel Coe
- The HIV Resistance Response Database Initiative (RDI), London, United Kingdom
| | - Brian Gazzard
- Chelsea and Westminster Hospital, London, United Kingdom
| | | | - H. Clifford Lane
- National Institutes of Allergy and Infectious Diseases, Bethesda, Maryland
| |
Collapse
|
36
|
Proteochemometric modeling of the susceptibility of mutated variants of the HIV-1 virus to reverse transcriptase inhibitors. PLoS One 2010; 5:e14353. [PMID: 21179544 PMCID: PMC3002298 DOI: 10.1371/journal.pone.0014353] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 11/10/2010] [Indexed: 12/16/2022] Open
Abstract
Background Reverse transcriptase is a major drug target in highly active antiretroviral therapy (HAART) against HIV, which typically comprises two nucleoside/nucleotide analog reverse transcriptase (RT) inhibitors (NRTIs) in combination with a non-nucleoside RT inhibitor or a protease inhibitor. Unfortunately, HIV is capable of escaping the therapy by mutating into drug-resistant variants. Computational models that correlate HIV drug susceptibilities to the virus genotype and to drug molecular properties might facilitate selection of improved combination treatment regimens. Methodology/Principal Findings We applied our earlier developed proteochemometric modeling technology to analyze HIV mutant susceptibility to the eight clinically approved NRTIs. The data set used covered 728 virus variants genotyped for 240 sequence residues of the DNA polymerase domain of the RT; 165 of these residues contained mutations; totally the data-set covered susceptibility data for 4,495 inhibitor-RT combinations. Inhibitors and RT sequences were represented numerically by 3D-structural and physicochemical property descriptors, respectively. The two sets of descriptors and their derived cross-terms were correlated to the susceptibility data by partial least-squares projections to latent structures. The model identified more than ten frequently occurring mutations, each conferring more than two-fold loss of susceptibility for one or several NRTIs. The most deleterious mutations were K65R, Q151M, M184V/I, and T215Y/F, each of them decreasing susceptibility to most of the NRTIs. The predictive ability of the model was estimated by cross-validation and by external predictions for new HIV variants; both procedures showed very high correlation between the predicted and actual susceptibility values (Q2 = 0.89 and Q2ext = 0.86). The model is available at www.hivdrc.org as a free web service for the prediction of the susceptibility to any of the clinically used NRTIs for any HIV-1 mutant variant. Conclusions/Significance Our results give directions how to develop approaches for selection of genome-based optimum combination therapy for patients harboring mutated HIV variants.
Collapse
|
37
|
Abstract
Viruses are fast evolving pathogens that continuously adapt to the highly variable environments they live and reproduce in. Strategies devoted to inhibit virus replication and to control their spread among hosts need to cope with these extremely heterogeneous populations and with their potential to avoid medical interventions. Computational techniques such as phylogenetic methods have broadened our picture of viral evolution both in time and space, and mathematical modeling has contributed substantially to our progress in unraveling the dynamics of virus replication, fitness, and virulence. Integration of multiple computational and mathematical approaches with experimental data can help to predict the behavior of viral pathogens and to anticipate their escape dynamics. This piece of information plays a critical role in some aspects of vaccine development, such as viral strain selection for vaccinations or rational attenuation of viruses. Here we review several aspects of viral evolution that can be addressed quantitatively, and we discuss computational methods that have the potential to improve vaccine design.
Collapse
Affiliation(s)
- Samuel Ojosnegros
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
| | | |
Collapse
|
38
|
Theys K, Deforche K, Libin P, Camacho RJ, Van Laethem K, Vandamme AM. Resistance pathways of human immunodeficiency virus type 1 against the combination of zidovudine and lamivudine. J Gen Virol 2010; 91:1898-1908. [PMID: 20410311 DOI: 10.1099/vir.0.022657-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A better understanding of human immunodeficiency virus type 1 drug-resistance evolution under the selective pressure of combination treatment is important for the design of long-term effective treatment strategies. We applied Bayesian network learning to sequences from patients treated with the reverse transcriptase inhibitor combination of zidovudine (AZT) and lamivudine (3TC) to identify the role of many treatment-selected mutations in the development of resistance. Based on the Bayesian network structure, an in vivo fitness landscape was built, reflecting the necessary selective pressure under treatment, to evolve naive sequences to sequences obtained from patients treated with the combination. This landscape, combined with an evolutionary model, was used to predict resistance evolution in longitudinal sequence pairs. In our analysis, mutations 41L, 70R, 184V and 215F/Y were identified as major resistance mutations to the combination of AZT and 3TC, as they were associated directly with treatment experience. The network also suggested a possible role in resistance development for a number of novel mutations. Estimated fitness, using the landscape, correlated significantly with in vitro resistance phenotype in genotype-phenotype pairs (R(2)=0.70). Variation in predicted evolution under selective pressure correlated significantly with observed in vivo evolution during AZT plus 3CT treatment. In conclusion, we confirmed current knowledge on resistance development to the combination of AZT and 3CT, but additional novel mutations were identified. Moreover, a model to predict resistance evolution during AZT and 3CT treatment has been built and validated.
Collapse
Affiliation(s)
- K Theys
- Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
| | | | - P Libin
- MyBioData, Rotselaar, Belgium
| | - R J Camacho
- Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - K Van Laethem
- Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
| | - A-M Vandamme
- Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
| |
Collapse
|
39
|
Alcaro S, Artese A, Ceccherini-Silberstein F, Ortuso F, Perno CF, Sing T, Svicher V. Molecular Dynamics and Free Energy Studies on the Wild-Type and Mutated HIV-1 Protease Complexed with Four Approved Drugs: Mechanism of Binding and Drug Resistance. J Chem Inf Model 2009; 49:1751-61. [DOI: 10.1021/ci900012k] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Stefano Alcaro
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| | - Anna Artese
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| | - Francesca Ceccherini-Silberstein
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| | - Francesco Ortuso
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| | - Carlo Federico Perno
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| | - Tobias Sing
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| | - Valentina Svicher
- Laboratorio di Chimica Farmaceutica Computazionale - Dipartimento di Scienze Farmacobiologiche, Università “Magna Græcia” di Catanzaro, Campus Universitario, Viale Europa, 88100 Catanzaro, Italy, Dipartimento di Medicina Sperimentale e Biochimica, Università “Tor Vergata”, Via Montpellier, 1, 00133, Roma, Italy, and Max-Planck-Institute for Informatics, Saarbrücken, Germany
| |
Collapse
|
40
|
Wang D, Larder B, Revell A, Montaner J, Harrigan R, De Wolf F, Lange J, Wegner S, Ruiz L, Pérez-Elías MJ, Emery S, Gatell J, D'Arminio Monforte A, Torti C, Zazzi M, Lane C. A comparison of three computational modelling methods for the prediction of virological response to combination HIV therapy. Artif Intell Med 2009; 47:63-74. [PMID: 19524413 DOI: 10.1016/j.artmed.2009.05.002] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Revised: 04/16/2009] [Accepted: 05/10/2009] [Indexed: 11/19/2022]
Abstract
OBJECTIVE HIV treatment failure is commonly associated with drug resistance and the selection of a new regimen is often guided by genotypic resistance testing. The interpretation of complex genotypic data poses a major challenge. We have developed artificial neural network (ANN) models that predict virological response to therapy from HIV genotype and other clinical information. Here we compare the accuracy of ANN with alternative modelling methodologies, random forests (RF) and support vector machines (SVM). METHODS Data from 1204 treatment change episodes (TCEs) were identified from the HIV Resistance Response Database Initiative (RDI) database and partitioned at random into a training set of 1154 and a test set of 50. The training set was then partitioned using an L-cross (L=10 in this study) validation scheme for training individual computational models. Seventy six input variables were used for training the models: 55 baseline genotype mutations; the 14 potential drugs in the new treatment regimen; four treatment history variables; baseline viral load; CD4 count and time to follow-up viral load. The output variable was follow-up viral load. Performance was evaluated in terms of the correlations and absolute differences between the individual models' predictions and the actual DeltaVL values. RESULTS The correlations (r(2)) between predicted and actual DeltaVL varied from 0.318 to 0.546 for ANN, 0.590 to 0.751 for RF and 0.300 to 0.720 for SVM. The mean absolute differences varied from 0.677 to 0.903 for ANN, 0.494 to 0.644 for RF and 0.500 to 0.790 for SVM. ANN models were significantly inferior to RF and SVM models. The predictions of the ANN, RF and SVM committees all correlated highly significantly with the actual DeltaVL of the independent test TCEs, producing r(2) values of 0.689, 0.707 and 0.620, respectively. The mean absolute differences were 0.543, 0.600 and 0.607log(10)copies/ml for ANN, RF and SVM, respectively. There were no statistically significant differences between the three committees. Combining the committees' outputs improved correlations between predicted and actual virological responses. The combination of all three committees gave a correlation of r(2)=0.728. The mean absolute differences followed a similar pattern. CONCLUSIONS RF and SVM models can produce predictions of virological response to HIV treatment that are comparable in accuracy to a committee of ANN models. Combining the predictions of different models improves their accuracy somewhat. This approach has potential as a future clinical tool and a combination of ANN and RF models is being taken forward for clinical evaluation.
Collapse
Affiliation(s)
- Dechao Wang
- The HIV Resistance Response Database Initiative (RDI), London, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Frange P, Galimand J, Goujard C, Deveau C, Ghosn J, Rouzioux C, Meyer L, Chaix ML. High frequency of X4/DM-tropic viruses in PBMC samples from patients with primary HIV-1 subtype-B infection in 1996-2007: the French ANRS CO06 PRIMO Cohort Study. J Antimicrob Chemother 2009; 64:135-41. [DOI: 10.1093/jac/dkp151] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
42
|
Chueca N, Garrido C, Alvarez M, Poveda E, de Dios Luna J, Zahonero N, Hernández-Quero J, Soriano V, Maroto C, de Mendoza C, García F. Improvement in the determination of HIV-1 tropism using the V3 gene sequence and a combination of bioinformatic tools. J Med Virol 2009; 81:763-7. [PMID: 19319937 DOI: 10.1002/jmv.21425] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Natalia Chueca
- Microbiology Department, Hospital Universitario San Cecilio, Granada, Spain
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Presence of HIV-1 R5 Viruses in Cerebrospinal Fluid Even in Patients Harboring R5X4/X4 Viruses in Plasma. J Acquir Immune Defic Syndr 2009; 51:60-4. [DOI: 10.1097/qai.0b013e31819fb903] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
Hou T, Zhang W, Wang J, Wang W. Predicting drug resistance of the HIV-1 protease using molecular interaction energy components. Proteins 2009; 74:837-46. [PMID: 18704937 DOI: 10.1002/prot.22192] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Drug resistance significantly impairs the efficacy of AIDS therapy. Therefore, precise prediction of resistant viral mutants is particularly useful for developing effective drugs and designing therapeutic regimen. In this study, we applied a structure-based computational approach to predict mutants of the HIV-1 protease resistant to the seven FDA approved drugs. We analyzed the energetic pattern of the protease-drug interaction by calculating the molecular interaction energy components (MIECs) between the drug and the protease residues. Support vector machines (SVMs) were trained on MIECs to classify protease mutants into resistant and nonresistant categories. The high prediction accuracies for the test sets of cross-validations suggested that the MIECs successfully characterized the interaction interface between drugs and the HIV-1 protease. We conducted a proof-of-concept study on a newly approved drug, darunavir (TMC114), on which no drug resistance data were available in the public domain. Compared with amprenavir, our analysis suggested that darunavir might be more potent to combat drug resistance. To quantitatively estimate binding affinities of drugs and study the contributions of protease residues to causing resistance, linear regression models were trained on MIECs using partial least squares (PLS). The MIEC-PLS models also achieved satisfactory prediction accuracy. Analysis of the fitting coefficients of MIECs in the regression model revealed the important resistance mutations and shed light into understanding the mechanisms of these mutations to cause resistance. Our study demonstrated the advantages of characterizing the protease-drug interaction using MIECs. We believe that MIEC-SVM and MIEC-PLS can help design new agents or combination of therapeutic regimens to counter HIV-1 protease resistant strains.
Collapse
Affiliation(s)
- Tingjun Hou
- Department of Chemistry and Biochemistry, University of California, La Jolla, San Diego, California 92093, USA
| | | | | | | |
Collapse
|
45
|
Genotypic antiretroviral resistance testing for human immunodeficiency virus type 1 integrase inhibitors by use of the TruGene sequencing system. J Clin Microbiol 2008; 46:4087-90. [PMID: 18945845 DOI: 10.1128/jcm.01246-08] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
A sequencing assay for detection of mutations conferring resistance to human immunodeficiency virus type 1 (HIV-1) integrase inhibitors raltegravir and elvitegravir was developed using the automated TruGene sequencing system. The assay returned clear sequencing results for samples with >or=500 RNA copies/ml for mutation detection and HIV-1 subtyping across a spectrum of HIV-1 subtypes.
Collapse
|
46
|
Primary genotypic resistance of HIV-1 to CCR5 antagonists in CCR5 antagonist treatment-naive patients. AIDS 2008; 22:2212-4. [PMID: 18832886 DOI: 10.1097/qad.0b013e328313bf9c] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Resistance to CCR5 antagonists can be driven by mutations in gp120. Sequences from 323 anti-CCR5 naive patients were analyzed for the presence of previously described in-vivo or in-vitro resistance mutations to CCR5 antagonists located in the V3 loop of gp120. The V3 loop region was rather polymorphic, and 7.3% of patients showed viruses with combinations of mutations in V3 loop previously described to be involved in maraviroc resistance, a licensed CCR5 antagonist.
Collapse
|
47
|
Abstract
OBJECTIVE To develop an improved model for the genetic basis of reduced susceptibility to tenofovir in vitro. METHODS A dataset of 532 HIV-1 subtype B reverse transcriptase genotypes for which matched phenotypic susceptibility data were available was assembled, both as a continuous (transformed) dataset and a categorical dataset generated by imposing a cut-off on the basis of earlier studies of in-vivo response of 1.4-fold. Models were generated using stepwise regression, decision tree and random forest approaches on both the continuous and categorical data. Models were compared by mean squared error (continuous models), or by misclassification rates by nested crossvalidation. RESULTS From the continuous dataset, stepwise linear regression, regression tree and regression forest methods yielded models with MSE of 0.46, 0.48 and 0.42 respectively. Amino acids 215, 65, 41, 67, 184 and 151 in HIV-1 reverse transcriptase were identified in all three models and amino acid 210 in two. The categorical data yielded logistic regression, classification tree and forest models with misclassification rates of 26, 24 and 23%, respectively. Amino acids 215, 65 and 67 appeared in all; 41, 184, 210 and 151 were also included in the classification forest model. CONCLUSION The random forests approach has yielded a substantial improvement in the available models to describe the genetic basis of reduced susceptibility to tenofovir in vitro. The most important sites in these models are amino acid sites 215, 65, 41, 67, 184, 151 and 210 in HIV-1 reverse transcriptase.
Collapse
|
48
|
Eriksson N, Pachter L, Mitsuya Y, Rhee SY, Wang C, Gharizadeh B, Ronaghi M, Shafer RW, Beerenwinkel N. Viral population estimation using pyrosequencing. PLoS Comput Biol 2008; 4:e1000074. [PMID: 18437230 PMCID: PMC2323617 DOI: 10.1371/journal.pcbi.1000074] [Citation(s) in RCA: 183] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Accepted: 03/27/2008] [Indexed: 12/20/2022] Open
Abstract
The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation–maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies. The genetic diversity of viral populations is important for biomedical problems such as disease progression, vaccine design, and drug resistance, yet it is not generally well understood. In this paper, we use pyrosequencing, a novel DNA sequencing technique, to reconstruct viral populations. Pyrosequencing produces DNA sequences, called reads, in numbers much greater than standard DNA sequencing techniques. However, these reads are substantially shorter and more error-prone than those obtained from standard sequencing techniques. Therefore, pyrosequencing data requires new methods of analysis. Here, we develop mathematical and statistical tools for reconstructing viral populations using pyrosequencing. To this end, we show how to correct errors in the reads and assemble them into the different viral strains present in the population. We apply these methods to HIV-1 populations from drug-resistant patients and show that our techniques produce results quite close to accepted techniques at a lower cost and potentially higher resolution.
Collapse
Affiliation(s)
- Nicholas Eriksson
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail: (NE); (NB)
| | - Lior Pachter
- Department of Mathematics, University of California, Berkeley, California, United States of America
| | - Yumi Mitsuya
- Division of Infectious Diseases, Stanford University Medical Center, Stanford, California, United States of America
| | - Soo-Yon Rhee
- Division of Infectious Diseases, Stanford University Medical Center, Stanford, California, United States of America
| | - Chunlin Wang
- Division of Infectious Diseases, Stanford University Medical Center, Stanford, California, United States of America
| | - Baback Gharizadeh
- Genome Technology Center, Stanford University, Palo Alto, California, United States of America
| | - Mostafa Ronaghi
- Genome Technology Center, Stanford University, Palo Alto, California, United States of America
| | - Robert W. Shafer
- Division of Infectious Diseases, Stanford University Medical Center, Stanford, California, United States of America
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- * E-mail: (NE); (NB)
| |
Collapse
|
49
|
Lapins M, Eklund M, Spjuth O, Prusis P, Wikberg JES. Proteochemometric modeling of HIV protease susceptibility. BMC Bioinformatics 2008; 9:181. [PMID: 18402661 PMCID: PMC2375133 DOI: 10.1186/1471-2105-9-181] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 04/10/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations. RESULTS The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72. CONCLUSION Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Pharmacology, Uppsala University, SE-751 24, Sweden.
| | | | | | | | | |
Collapse
|
50
|
Soulié C, Calvez V. [HIV tropism assays when first CCR5-antagonist becomes available]. Med Mal Infect 2008; 38 Suppl 1:S7-11. [PMID: 18455056 DOI: 10.1016/s0399-077x(08)70538-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Since its co-receptors were discovered, HIV tropism is defined by the type of co-receptor used to infect its host cells : R5 for viruses using only CCR5, X4 for viruses using only CXCR4, and R5/X4 or dual for viruses using either CCR5 or CXCR4. Tropism prediction actually made is via recombinant phenotypic assays, which are labour-intensive, long-lasting and expensive, available in only a few highly qualified laboratories. Prescription of CCR5-antagonists requires screening for the presence of X4 variants before starting therapy. With development of this new class of antiretrovirals there is a need for genotypic tests, based on V3 loop gp 120 env gene sequencing and its interpretation with an algorithm. As for resistance tests 10 years ago, these genotypic tests will be easier to introduce into routine clinical practice. These genotypic tropism assays will help to select patients before CCR5-antagonist prescription and to follow patients treated by this class of antiretrovirals, looking for a potential switch of tropism during therapy.
Collapse
Affiliation(s)
- C Soulié
- Laboratoire de Virologie, CERVI, Hôpital de la Pitié-Salpêtrière, Paris cedex 13, France
| | | |
Collapse
|