101
|
Suh KS, Sarojini S, Youssif M, Nalley K, Milinovikj N, Elloumi F, Russell S, Pecora A, Schecter E, Goy A. Tissue banking, bioinformatics, and electronic medical records: the front-end requirements for personalized medicine. JOURNAL OF ONCOLOGY 2013; 2013:368751. [PMID: 23818899 PMCID: PMC3683471 DOI: 10.1155/2013/368751] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 05/03/2013] [Accepted: 05/07/2013] [Indexed: 11/26/2022]
Abstract
Personalized medicine promises patient-tailored treatments that enhance patient care and decrease overall treatment costs by focusing on genetics and "-omics" data obtained from patient biospecimens and records to guide therapy choices that generate good clinical outcomes. The approach relies on diagnostic and prognostic use of novel biomarkers discovered through combinations of tissue banking, bioinformatics, and electronic medical records (EMRs). The analytical power of bioinformatic platforms combined with patient clinical data from EMRs can reveal potential biomarkers and clinical phenotypes that allow researchers to develop experimental strategies using selected patient biospecimens stored in tissue banks. For cancer, high-quality biospecimens collected at diagnosis, first relapse, and various treatment stages provide crucial resources for study designs. To enlarge biospecimen collections, patient education regarding the value of specimen donation is vital. One approach for increasing consent is to offer publically available illustrations and game-like engagements demonstrating how wider sample availability facilitates development of novel therapies. The critical value of tissue bank samples, bioinformatics, and EMR in the early stages of the biomarker discovery process for personalized medicine is often overlooked. The data obtained also require cross-disciplinary collaborations to translate experimental results into clinical practice and diagnostic and prognostic use in personalized medicine.
Collapse
Affiliation(s)
- K. Stephen Suh
- The Genomics and Biomarkers Program, The John Theurer Cancer Center at Hackensack, University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601, USA
| | - Sreeja Sarojini
- The Genomics and Biomarkers Program, The John Theurer Cancer Center at Hackensack, University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601, USA
| | - Maher Youssif
- The Genomics and Biomarkers Program, The John Theurer Cancer Center at Hackensack, University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601, USA
| | - Kip Nalley
- Sophic Systems Alliance Inc., 20271 Goldenrod Lane, Germantown, MD 20876, USA
| | - Natasha Milinovikj
- The Genomics and Biomarkers Program, The John Theurer Cancer Center at Hackensack, University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601, USA
| | - Fathi Elloumi
- Sophic Systems Alliance Inc., 20271 Goldenrod Lane, Germantown, MD 20876, USA
| | - Steven Russell
- Siemens Corporate Research, IT Platforms, Princeton, NJ 08540, USA
| | - Andrew Pecora
- The Genomics and Biomarkers Program, The John Theurer Cancer Center at Hackensack, University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601, USA
| | | | - Andre Goy
- The Genomics and Biomarkers Program, The John Theurer Cancer Center at Hackensack, University Medical Center, D. Jurist Research Building, 40 Prospect Avenue, Hackensack, NJ 07601, USA
| |
Collapse
|
102
|
Capriotti E, Calabrese R, Fariselli P, Martelli PL, Altman RB, Casadio R. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation. BMC Genomics 2013; 14 Suppl 3:S6. [PMID: 23819482 PMCID: PMC3665478 DOI: 10.1186/1471-2164-14-s3-s6] [Citation(s) in RCA: 248] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. RESULTS The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO(3d) programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. CONCLUSIONS WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go.
Collapse
Affiliation(s)
- Emidio Capriotti
- Division of Informatics, Department of Pathology, University of Alabama at Birmingham, Birmingham AL, USA.
| | | | | | | | | | | |
Collapse
|
103
|
Capriotti E, Altman RB, Bromberg Y. Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 2013; 14 Suppl 3:S2. [PMID: 23819846 PMCID: PMC3839641 DOI: 10.1186/1471-2164-14-s3-s2] [Citation(s) in RCA: 186] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background In recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy. Results Here we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthew's correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor. Conclusions Here we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC. Meta-SNP server is freely accessible at http://snps.biofold.org/meta-snp.
Collapse
Affiliation(s)
- Emidio Capriotti
- Division of Informatics, Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA.
| | | | | |
Collapse
|
104
|
Polimanti R, Piacentini S, Manfellotto D, Fuciarelli M. Human genetic variation of CYP450 superfamily: analysis of functional diversity in worldwide populations. Pharmacogenomics 2013; 13:1951-60. [PMID: 23215887 DOI: 10.2217/pgs.12.163] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
AIM The present study aimed to investigate the human genetic diversity of the CYP450 superfamily in order to identify functional interethnic differences and analyze the role of CYP450 enzymes in human adaptation. MATERIALS & METHODS A computational analysis of genetic and functional differences of the 57 CYP450 genes was performed using the Human Genome Diversity Project and HapMap data; comprising approximately 1694 individuals belonging to 62 human populations. RESULTS Twenty-six CYP450 SNPs with F-statistics significantly different than the general distribution were identified. Some showed high differentiation among human populations, suggesting that functional interethnic differences may be present. Indeed, some of these are significantly associated with drug response or disease risk. Furthermore, our data highlighted that TBXAS1 and genes in CYP3A cluster may have a role in some processes of human adaptation. CONCLUSION Our study provided an analysis of genetic diversity of CYP450 superfamily, identifying functional differences among ethnic groups and their related clinical phenotypes.
Collapse
Affiliation(s)
- Renato Polimanti
- Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica 1, 00133 Rome, Italy.
| | | | | | | |
Collapse
|
105
|
Cushing A, Flaherty P, Hopmans E, Bell JM, Ji HP. RVD: a command-line program for ultrasensitive rare single nucleotide variant detection using targeted next-generation DNA resequencing. BMC Res Notes 2013; 6:206. [PMID: 23701658 PMCID: PMC3695852 DOI: 10.1186/1756-0500-6-206] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 05/16/2013] [Indexed: 01/23/2023] Open
Abstract
Background Rare single nucleotide variants play an important role in genetic diversity and heterogeneity of specific human disease. For example, an individual clinical sample can harbor rare mutations at minor frequencies. Genetic diversity within an individual clinical sample is oftentimes reflected in rare mutations. Therefore, detecting rare variants prior to treatment may prove to be a useful predictor for therapeutic response. Current rare variant detection algorithms using next generation DNA sequencing are limited by inherent sequencing error rate and platform availability. Findings Here we describe an optimized implementation of a rare variant detection algorithm called RVD for use in targeted gene resequencing. RVD is available both as a command-line program and for use in MATLAB and estimates context-specific error using a beta-binomial model to call variants with minor allele frequency (MAF) as low as 0.1%. We show that RVD accepts standard BAM formatted sequence files. We tested RVD analysis on multiple Illumina sequencing platforms, among the most widely used DNA sequencing platforms. Conclusions RVD meets a growing need for highly sensitive and specific tools for variant detection. To demonstrate the usefulness of RVD, we carried out a thorough analysis of the software’s performance on synthetic and clinical virus samples sequenced on both an Illumina GAIIx and a MiSeq. We expect RVD can improve understanding the genetics and treatment of common viral diseases including influenza. RVD is available at the following URL:http://dna-discovery.stanford.edu/software/rvd/.
Collapse
Affiliation(s)
- Anna Cushing
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | | | | | | | | |
Collapse
|
106
|
Hall D, Huerta MF, McAuliffe MJ, Farber GK. Sharing heterogeneous data: the national database for autism research. Neuroinformatics 2013; 10:331-9. [PMID: 22622767 DOI: 10.1007/s12021-012-9151-4] [Citation(s) in RCA: 99] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The National Database for Autism Research (NDAR) is a secure research data repository designed to promote scientific data sharing and collaboration among autism spectrum disorder investigators. The goal of the project is to accelerate scientific discovery through data sharing, data harmonization, and the reporting of research results. Data from over 25,000 research participants are available to qualified investigators through the NDAR portal. Summary information about the available data is available to everyone through that portal.
Collapse
Affiliation(s)
- Dan Hall
- OMNITEC Solutions, Inc., 6001 Executive Boulevard, Suite 7161, Rockville, MD 20892-9640, USA.
| | | | | | | |
Collapse
|
107
|
Whole-genome and whole-exome sequencing in hereditary cancer: impact on genetic testing and counseling. Cancer J 2013; 18:287-92. [PMID: 22846728 DOI: 10.1097/ppo.0b013e318262467e] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The incorporation of whole-genome and whole-exome sequencing into clinical practice will undoubtedly change the way genetic counselors and other clinicians approach genetic testing. Enabling the analysis of essentially all human genes in one comprehensive test, this new technology can result in reduced testing cost and time to diagnosis. Another consequence of this broad scope, however, is the increased amount, complexity, and variety of results a clinician may need to discuss with a patient. The purpose of this article is to review the technology and outline some of the benefits and challenges of whole-genome and whole-exome sequencing in hereditary cancer practice.
Collapse
|
108
|
Emmert-Streib F. Personalized medicine: Has it started yet? A reconstruction of the early history. Front Genet 2013; 3:313. [PMID: 23316213 PMCID: PMC3539161 DOI: 10.3389/fgene.2012.00313] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 12/19/2012] [Indexed: 11/18/2022] Open
Abstract
Within the last few years the field personalized medicine entered the stage. Accompanied with great hopes and expectations it is believed that this field may have the potential to revolutionize medical and clinical care by utilizing genomics information about the individual patients themselves. In this paper, we reconstruct the early footprints of personalized medicine as reflected by information retrieved from PubMed and Google Scholar. That means we are providing a data-driven perspective of this field to estimate its current status and potential problems.
Collapse
Affiliation(s)
- Frank Emmert-Streib
- Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Faculty of Medicine, Health and Life Sciences, Queen's University Belfast Belfast, UK
| |
Collapse
|
109
|
Lopes P, Oliveira JL. COEUS: "semantic web in a box" for biomedical applications. J Biomed Semantics 2012; 3:11. [PMID: 23244467 PMCID: PMC3554586 DOI: 10.1186/2041-1480-3-11] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 11/05/2012] [Indexed: 11/30/2022] Open
Abstract
Background As the “omics” revolution unfolds, the growth in data quantity and diversity is bringing about the need for pioneering bioinformatics software, capable of significantly improving the research workflow. To cope with these computer science demands, biomedical software engineers are adopting emerging semantic web technologies that better suit the life sciences domain. The latter’s complex relationships are easily mapped into semantic web graphs, enabling a superior understanding of collected knowledge. Despite increased awareness of semantic web technologies in bioinformatics, their use is still limited. Results COEUS is a new semantic web framework, aiming at a streamlined application development cycle and following a “semantic web in a box” approach. The framework provides a single package including advanced data integration and triplification tools, base ontologies, a web-oriented engine and a flexible exploration API. Resources can be integrated from heterogeneous sources, including CSV and XML files or SQL and SPARQL query results, and mapped directly to one or more ontologies. Advanced interoperability features include REST services, a SPARQL endpoint and LinkedData publication. These enable the creation of multiple applications for web, desktop or mobile environments, and empower a new knowledge federation layer. Conclusions The platform, targeted at biomedical application developers, provides a complete skeleton ready for rapid application deployment, enhancing the creation of new semantic information systems. COEUS is available as open source at http://bioinformatics.ua.pt/coeus/.
Collapse
Affiliation(s)
- Pedro Lopes
- DETI/IEETA, Universidade de Aveiro, Campus Universitário de Santiago, Aveiro, 3810 - 193, Portugal.
| | | |
Collapse
|
110
|
Chen R, Snyder M. Promise of personalized omics to precision medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012. [PMID: 23184638 DOI: 10.1002/wsbm.1198] [Citation(s) in RCA: 201] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.
Collapse
Affiliation(s)
- Rui Chen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | | |
Collapse
|
111
|
Kim Kjærulff S, Wich L, Kringelum J, Jacobsen UP, Kouskoumvekaki I, Audouze K, Lund O, Brunak S, Oprea TI, Taboureau O. ChemProt-2.0: visual navigation in a disease chemical biology database. Nucleic Acids Res 2012. [PMID: 23185041 PMCID: PMC3531079 DOI: 10.1093/nar/gks1166] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries.
Collapse
Affiliation(s)
- Sonny Kim Kjærulff
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Lyngby, Denmark
| | | | | | | | | | | | | | | | | | | |
Collapse
|
112
|
Schmidt BJ, Papin JA, Musante CJ. Mechanistic systems modeling to guide drug discovery and development. Drug Discov Today 2012; 18:116-27. [PMID: 22999913 DOI: 10.1016/j.drudis.2012.09.003] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Revised: 08/17/2012] [Accepted: 09/05/2012] [Indexed: 01/24/2023]
Abstract
A crucial question that must be addressed in the drug development process is whether the proposed therapeutic target will yield the desired effect in the clinical population. Pharmaceutical and biotechnology companies place a large investment on research and development, long before confirmatory data are available from human trials. Basic science has greatly expanded the computable knowledge of disease processes, both through the generation of large omics data sets and a compendium of studies assessing cellular and systemic responses to physiologic and pathophysiologic stimuli. Given inherent uncertainties in drug development, mechanistic systems models can better inform target selection and the decision process for advancing compounds through preclinical and clinical research.
Collapse
Affiliation(s)
- Brian J Schmidt
- Department of Bioengineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | | | | |
Collapse
|
113
|
Coutant S, Cabot C, Lefebvre A, Léonard M, Prieur-Gaston E, Campion D, Lecroq T, Dauchel H. EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics. BMC Bioinformatics 2012; 13 Suppl 14:S9. [PMID: 23095660 PMCID: PMC3439720 DOI: 10.1186/1471-2105-13-s14-s9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background Whole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s) and the affected gene(s). For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design. Results We describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease. Conclusions EVA is developed to be a user-friendly, versatile, and efficient-filtering assisting software for WES. It constitutes a platform for data storage and for drastic screening of clinical relevant genetics variations by non-programmer geneticists. Thereby, it provides a response to new needs at the expanding era of medical genomics investigated by WES for both fundamental research and clinical diagnostics.
Collapse
Affiliation(s)
- Sophie Coutant
- University of Rouen, INSERM U1079 Molecular genetics of cancer and neuropsychiatric diseases, 76183 Rouen cedex, France
| | | | | | | | | | | | | | | |
Collapse
|
114
|
Welch BM, Kawamoto K. Clinical decision support for genetically guided personalized medicine: a systematic review. J Am Med Inform Assoc 2012; 20:388-400. [PMID: 22922173 DOI: 10.1136/amiajnl-2012-000892] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
OBJECTIVE To review the literature on clinical decision support (CDS) for genetically guided personalized medicine (GPM). MATERIALS AND METHODS MEDLINE and Embase were searched from 1990 to 2011. The manuscripts included were summarized, and notable themes and trends were identified. RESULTS Following a screening of 3416 articles, 38 primary research articles were identified. Focal areas of research included family history-driven CDS, cancer management, and pharmacogenomics. Nine randomized controlled trials of CDS interventions for GPM were identified, seven of which reported positive results. The majority of manuscripts were published on or after 2007, with increased recent focus on genotype-driven CDS and the integration of CDS within primary clinical information systems. DISCUSSION Substantial research has been conducted to date on the use of CDS to enable GPM. In a previous analysis of CDS intervention trials, the automatic provision of CDS as a part of routine clinical workflow had been identified as being critical for CDS effectiveness. There was some indication that CDS for GPM could potentially be effective without the CDS being provided automatically, but we did not find conclusive evidence to support this hypothesis. CONCLUSION To maximize the clinical benefits arising from ongoing discoveries in genetics and genomics, additional research and development is recommended for identifying how best to leverage CDS to bridge the gap between the promise and realization of GPM.
Collapse
Affiliation(s)
- Brandon M Welch
- Department of Biomedical Informatics and Program in Personalized Health Care, University of Utah, Salt Lake City, UT 84092, USA
| | | |
Collapse
|
115
|
FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model. PLoS One 2012; 7:e43847. [PMID: 22937107 PMCID: PMC3427247 DOI: 10.1371/journal.pone.0043847] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 07/26/2012] [Indexed: 11/26/2022] Open
Abstract
Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.
Collapse
|
116
|
Valencia A, Hidalgo M. Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics. Genome Med 2012; 4:61. [PMID: 22839973 PMCID: PMC3580417 DOI: 10.1186/gm362] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Progress in genomics has raised expectations in many fields, and particularly in personalized cancer research. The new technologies available make it possible to combine information about potential disease markers, altered function and accessible drug targets, which, coupled with pathological and medical information, will help produce more appropriate clinical decisions. The accessibility of such experimental techniques makes it all the more necessary to improve and adapt computational strategies to the new challenges. This review focuses on the critical issues associated with the standard pipeline, which includes: DNA sequencing analysis; analysis of mutations in coding regions; the study of genome rearrangements; extrapolating information on mutations to the functional and signaling level; and predicting the effects of therapies using mouse tumor models. We describe the possibilities, limitations and future challenges of current bioinformatics strategies for each of these issues. Furthermore, we emphasize the need for the collaboration between the bioinformaticians who implement the software and use the data resources, the computational biologists who develop the analytical methods, and the clinicians, the systems' end users and those ultimately responsible for taking medical decisions. Finally, the different steps in cancer genome analysis are illustrated through examples of applications in cancer genome analysis.
Collapse
Affiliation(s)
- Alfonso Valencia
- Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernández Almagro, 3, E-28029 Madrid, Spain
| | - Manuel Hidalgo
- Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernández Almagro, 3, E-28029 Madrid, Spain
| |
Collapse
|
117
|
Skirton H, Jackson L, Goldsmith L, O'Connor A. Genomic medicine: what are the challenges for the National Health Service? Per Med 2012; 9:539-545. [PMID: 29768768 DOI: 10.2217/pme.12.61] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
'Personalized medicine' is inextricably linked with current advances in genomics. Although initial claims about the power of genomic tests have been modified, they have the potential to inform a personalized approach to healthcare. Within the health service, genomic testing is being applied in specific situations to inform therapy; however, more robust studies are needed to identify those tests that can make significant improvements to management and prevention of disease. Despite efforts to educate health professionals, genetic literacy remains unsatisfactory and more efforts are needed to embed genetics in pre- and post-registration professional education, and therefore, maximize benefit for patients. Primary care and public health professionals may be contexts in which genomics can be utilized for both personalized healthcare and promotion of community health.
Collapse
Affiliation(s)
- Heather Skirton
- Faculty of Health, Education & Society, Plymouth University, Wellington Road, Taunton, TA1 5YD, UK.
| | - Leigh Jackson
- Faculty of Health, Education & Society, Plymouth University, Drake Circus, Plymouth, PL4 8AA, UK
| | - Lesley Goldsmith
- Faculty of Health, Education & Society, Plymouth University, Drake Circus, Plymouth, PL4 8AA, UK
| | - Anita O'Connor
- Faculty of Health, Education & Society, Plymouth University, Drake Circus, Plymouth, PL4 8AA, UK
| |
Collapse
|
118
|
Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012; 13:495-512. [PMID: 22247263 PMCID: PMC3404395 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Revised: 11/08/2011] [Indexed: 01/02/2023] Open
Abstract
An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.
Collapse
|
119
|
Braun K, Beining M, Wiessler M, Lammers T, Pipkorn R, Hennrich U, Nokihara K, Semmler W, Debus J, Waldeck W. BioShuttle mobility in living cells studied with high-resolution FCS & CLSM methodologies. Int J Med Sci 2012; 9:339-52. [PMID: 22811608 PMCID: PMC3399214 DOI: 10.7150/ijms.4414] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2012] [Accepted: 06/18/2012] [Indexed: 01/04/2023] Open
Abstract
With the increase in molecular diagnostics and patient-specific therapeutic approaches, the delivery and targeting of imaging molecules and pharmacologically active agents gain increasing importance. The ideal delivery system does not exist yet. The realization of two features is indispensable: first, a locally high concentration of target-specific diagnostic and therapeutic molecules; second, the broad development of effective and safe carrier systems. Here we characterize the transport properties of the peptide-based BioShuttle transporter using FFM and CLSM methods. The modular design of BioShuttle-based formulations results in a multi-faceted field of applications, also as a theranostic tool.
Collapse
Affiliation(s)
- Klaus Braun
- Dept. of Imaging and Radiooncology, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, Germany.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
120
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|
121
|
Primig M. The bioinformatics tool box for reproductive biology. Biochim Biophys Acta Mol Basis Dis 2012; 1822:1880-95. [PMID: 22687534 DOI: 10.1016/j.bbadis.2012.05.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Revised: 05/04/2012] [Accepted: 05/28/2012] [Indexed: 10/28/2022]
Abstract
Genetics and molecular biology have been instrumental for a better understanding of heritable defects causing human infertility over the past decades. More recently, the field of reproductive biology has harnessed genome biological approaches to gain insight into molecular processes underlying normal and pathological gametogenesis and gamete function. We are currently witnessing yet another quantum leap in our ability to monitor the flow of information from the genome via the transcriptome to the proteome: tiling arrays that cover both strands of a given target genome and RNA-Seq, a method based on ultra-high throughput DNA sequencing, enable us to study noncoding and protein-coding transcripts with unprecedented precision and depth at a reasonable cost. These technologies have spawned a thriving discipline within the bioinformatics field that employs information technology for managing and interpreting biological high-throughput data. This review outlines database projects and online analysis tools useful for life scientists in general and discusses in detail selected projects that have specifically been developed for researchers and clinicians in the field of reproductive biology. This article is part of a Special Issue entitled: Molecular Genetics of Human Reproductive Failure.
Collapse
Affiliation(s)
- Michael Primig
- Inserm UMR1085-Irset, Université de Rennes 1, Rennes, France.
| |
Collapse
|
122
|
Rizzo JM, Buck MJ. Key principles and clinical applications of "next-generation" DNA sequencing. Cancer Prev Res (Phila) 2012; 5:887-900. [PMID: 22617168 DOI: 10.1158/1940-6207.capr-11-0432] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Demand for fast, inexpensive, and accurate DNA sequencing data has led to the birth and dominance of a new generation of sequencing technologies. So-called "next-generation" sequencing technologies enable rapid generation of data by sequencing massive amounts of DNA in parallel using diverse methodologies which overcome the limitations of Sanger sequencing methods used to sequence the first human genome. Despite opening new frontiers of genomics research, the fundamental shift away from the Sanger sequencing that next-generation technologies has created has also left many unaware of the capabilities and applications of these new technologies, especially those in the clinical realm. Moreover, the brisk evolution of sequencing technologies has flooded the market with commercially available sequencing platforms, whose unique chemistries and diverse applications stand as another obstacle restricting the potential of next-generation sequencing. This review serves to provide a primer on next-generation sequencing technologies for clinical researchers and physician scientists. We provide an overview of the capabilities and clinical applications of DNA sequencing technologies to raise awareness among researchers about the power of these novel genomic tools. In addition, we discuss that key sequencing principles provide a comparison between existing and near-term technologies and outline key advantages and disadvantages between different sequencing platforms to help researchers choose an appropriate platform for their research interests.
Collapse
Affiliation(s)
- Jason M Rizzo
- Department of Biochemistry and Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, 701 Elicott St., Buffalo, NY 14203, USA.
| | | |
Collapse
|
123
|
Taboureau O, Baell JB, Fernández-Recio J, Villoutreix BO. Established and emerging trends in computational drug discovery in the structural genomics era. ACTA ACUST UNITED AC 2012; 19:29-41. [PMID: 22284352 DOI: 10.1016/j.chembiol.2011.12.007] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Revised: 12/05/2011] [Accepted: 12/08/2011] [Indexed: 12/01/2022]
Abstract
Bioinformatics and chemoinformatics approaches contribute to hit discovery, hit-to-lead optimization, safety profiling, and target identification and enhance our overall understanding of the health and disease states. A vast repertoire of computational methods has been reported and increasingly combined in order to address more and more challenging targets or complex molecular mechanisms in the context of large-scale integration of structure and bioactivity data produced by private and public drug research. This review explores some key computational methods directly linked to drug discovery and chemical biology with a special emphasis on compound collection preparation, virtual screening, protein docking, and systems pharmacology. A list of generally freely available software packages and online resources is provided, and examples of successful applications are briefly commented upon.
Collapse
Affiliation(s)
- Olivier Taboureau
- Center for Biological Sequences Analysis, Department of Systems Biology, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | | | | | | |
Collapse
|
124
|
Olatubosun A, Väliaho J, Härkönen J, Thusberg J, Vihinen M. PON-P: integrated predictor for pathogenicity of missense variants. Hum Mutat 2012; 33:1166-74. [PMID: 22505138 DOI: 10.1002/humu.22102] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 03/28/2012] [Indexed: 12/21/2022]
Abstract
High-throughput sequencing data generation demands the development of methods for interpreting the effects of genomic variants. Numerous computational methods have been developed to assess the impact of variations because experimental methods are unable to cope with both the speed and volume of data generation. To harness the strength of currently available predictors, the Pathogenic-or-Not-Pipeline (PON-P) integrates five predictors to predict the probability that nonsynonymous variations affect protein function and may consequently be disease related. Random forest methodology-based PON-P shows consistently improved performance in cross-validation tests and on independent test sets, providing ternary classification and statistical reliability estimate of results. Applied to missense variants in a melanoma cancer cell line, PON-P predicts variants in 17 genes to affect protein function. Previous studies implicate nine of these genes in the pathogenesis of various forms of cancer. PON-P may thus be used as a first step in screening and prioritizing variants to determine deleterious ones for further experimentation.
Collapse
Affiliation(s)
- Ayodeji Olatubosun
- Institute of Biomedical Technology, University of Tampere, Tampere, Finland
| | | | | | | | | |
Collapse
|
125
|
Lahti JL, Tang GW, Capriotti E, Liu T, Altman RB. Bioinformatics and variability in drug response: a protein structural perspective. J R Soc Interface 2012; 9:1409-37. [PMID: 22552919 DOI: 10.1098/rsif.2011.0843] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk-benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein-drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein-drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants.
Collapse
Affiliation(s)
- Jennifer L Lahti
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | | | | | | |
Collapse
|
126
|
Abstract
The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.
Collapse
Affiliation(s)
- Christos A Ouzounis
- Institute of Agrobiotechnology, Centre for Research & Technology Hellas-CERTH, Thessaloniki, Greece.
| |
Collapse
|
127
|
Abstract
Background With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework. In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data. Methods A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest. Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite. Results We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application. Conclusions This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development. The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.
Collapse
Affiliation(s)
- Achille Zappa
- Bioinformatics, IRCCS AOU San Martino-IST National Cancer Research Institute, Genoa, I-16132, Italy
| | | | | |
Collapse
|
128
|
Lachke SA, Ho JWK, Kryukov GV, O'Connell DJ, Aboukhalil A, Bulyk ML, Park PJ, Maas RL. iSyTE: integrated Systems Tool for Eye gene discovery. Invest Ophthalmol Vis Sci 2012; 53:1617-27. [PMID: 22323457 DOI: 10.1167/iovs.11-8839] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
PURPOSE To facilitate the identification of genes associated with cataract and other ocular defects, the authors developed and validated a computational tool termed iSyTE (integrated Systems Tool for Eye gene discovery; http://bioinformatics.udel.edu/Research/iSyTE). iSyTE uses a mouse embryonic lens gene expression data set as a bioinformatics filter to select candidate genes from human or mouse genomic regions implicated in disease and to prioritize them for further mutational and functional analyses. METHODS Microarray gene expression profiles were obtained for microdissected embryonic mouse lens at three key developmental time points in the transition from the embryonic day (E)10.5 stage of lens placode invagination to E12.5 lens primary fiber cell differentiation. Differentially regulated genes were identified by in silico comparison of lens gene expression profiles with those of whole embryo body (WB) lacking ocular tissue. RESULTS Gene set analysis demonstrated that this strategy effectively removes highly expressed but nonspecific housekeeping genes from lens tissue expression profiles, allowing identification of less highly expressed lens disease-associated genes. Among 24 previously mapped human genomic intervals containing genes associated with isolated congenital cataract, the mutant gene is ranked within the top two iSyTE-selected candidates in approximately 88% of cases. Finally, in situ hybridization confirmed lens expression of several novel iSyTE-identified genes. CONCLUSIONS iSyTE is a publicly available Web resource that can be used to prioritize candidate genes within mapped genomic intervals associated with congenital cataract for further investigation. Extension of this approach to other ocular tissue components will facilitate eye disease gene discovery.
Collapse
Affiliation(s)
- Salil A Lachke
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital, Harvard edical School, Boston, Massachusetts, USA
| | | | | | | | | | | | | | | |
Collapse
|
129
|
Panagiotou G, Taboureau O. The impact of network biology in pharmacology and toxicology. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:221-235. [PMID: 22352466 DOI: 10.1080/1062936x.2012.657237] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
With the need to investigate alternative approaches and emerging technologies in order to increase drug efficacy and reduce adverse drug effects, network biology offers a novel way of approaching drug discovery by considering the effect of a molecule and protein's function in a global physiological environment. By studying drug action across multiple scales of complexity, from molecular to cellular and tissue level, network-based computational methods have the potential to improve our understanding of the impact of chemicals in human health. In this review we present the available large-scale databases and tools that allow integration and analysis of such information for understanding the properties of small molecules in the context of cellular networks. With the recent advances in the omics area, global integrative approaches are necessary to cope with the massive amounts of data, and biomedical researchers are urged to implement new types of analyses that can lead to new therapeutic interventions with increased safety and efficacy compared with existing medications.
Collapse
Affiliation(s)
- G Panagiotou
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | | |
Collapse
|
130
|
Koikkalainen J, Pölönen H, Mattila J, van Gils M, Soininen H, Lötjönen J. Improved classification of Alzheimer's disease data via removal of nuisance variability. PLoS One 2012; 7:e31112. [PMID: 22348041 PMCID: PMC3278425 DOI: 10.1371/journal.pone.0031112] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Accepted: 01/02/2012] [Indexed: 11/18/2022] Open
Abstract
Diagnosis of Alzheimer's disease is based on the results of neuropsychological tests and available supporting biomarkers such as the results of imaging studies. The results of the tests and the values of biomarkers are dependent on the nuisance features, such as age and gender. In order to improve diagnostic power, the effects of the nuisance features have to be removed from the data. In this paper, four types of interactions between classification features and nuisance features were identified. Three methods were tested to remove these interactions from the classification data. In stratified analysis, a homogeneous subgroup was generated from a training set. Data correction method utilized linear regression model to remove the effects of nuisance features from data. The third method was a combination of these two methods. The methods were tested using all the baseline data from the Alzheimer's Disease Neuroimaging Initiative database in two classification studies: classifying control subjects from Alzheimer's disease patients and discriminating stable and progressive mild cognitive impairment subjects. The results show that both stratified analysis and data correction are able to statistically significantly improve the classification accuracy of several neuropsychological tests and imaging biomarkers. The improvements were especially large for the classification of stable and progressive mild cognitive impairment subjects, where the best improvements observed were 6% units. The data correction method gave better results for imaging biomarkers, whereas stratified analysis worked well with the neuropsychological tests. In conclusion, the study shows that the excess variability caused by nuisance features should be removed from the data to improve the classification accuracy, and therefore, the reliability of diagnosis making.
Collapse
|
131
|
Haibe-Kains B, Olsen C, Djebbari A, Bontempi G, Correll M, Bouton C, Quackenbush J. Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks. Nucleic Acids Res 2012; 40:D866-75. [PMID: 22096235 PMCID: PMC3245161 DOI: 10.1093/nar/gkr1050] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Revised: 10/09/2011] [Accepted: 10/23/2011] [Indexed: 12/03/2022] Open
Abstract
Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these 'known' interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.
Collapse
Affiliation(s)
- Benjamin Haibe-Kains
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| | - Catharina Olsen
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| | - Amira Djebbari
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| | - Gianluca Bontempi
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| | - Mick Correll
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| | - Christopher Bouton
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA, Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium, Ontario Cancer Institute, Princess Margaret Hospital/UHN, and the Campbell Family Institute for Cancer Research, University of Toronto, Toronto, ON M5G 1L7, Canada, Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215 and Entagen, Newburyport, MA 01950, USA
| |
Collapse
|
132
|
Mestan KK, Ilkhanoff L, Mouli S, Lin S. Genomic sequencing in clinical trials. J Transl Med 2011; 9:222. [PMID: 22206293 PMCID: PMC3269395 DOI: 10.1186/1479-5876-9-222] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 12/30/2011] [Indexed: 12/02/2022] Open
Abstract
Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to find its way into clinical trials both nationally and worldwide. We highlight the currently available types of genomic sequencing platforms, outline the advantages and disadvantages of each, and compare first- and next-generation techniques with respect to capabilities, quality, and cost. We describe the current geographical distributions and types of disease conditions in which these technologies are used, and how next-generation sequencing is strategically being incorporated into new and existing studies. Lastly, recent major breakthroughs and the ongoing challenges of using genomic sequencing in clinical research are discussed.
Collapse
Affiliation(s)
- Karen K Mestan
- Department of Pediatrics, Division of Neonatology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.
| | | | | | | |
Collapse
|
133
|
PLoS Computational Biology Conference Postcards from ISMB/ECCB 2011. PLoS Comput Biol 2011; 7:e1002259. [PMID: 22125481 PMCID: PMC3219612 DOI: 10.1371/journal.pcbi.1002259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
134
|
De Baets G, Van Durme J, Reumers J, Maurer-Stroh S, Vanhee P, Dopazo J, Schymkowitz J, Rousseau F. SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants. Nucleic Acids Res 2011; 40:D935-9. [PMID: 22075996 PMCID: PMC3245173 DOI: 10.1093/nar/gkr996] [Citation(s) in RCA: 211] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Single nucleotide variants (SNVs) are, together with copy number variation, the primary source of variation in the human genome and are associated with phenotypic variation such as altered response to drug treatment and susceptibility to disease. Linking structural effects of non-synonymous SNVs to functional outcomes is a major issue in structural bioinformatics. The SNPeffect database (http://snpeffect.switchlab.org) uses sequence- and structure-based bioinformatics tools to predict the effect of protein-coding SNVs on the structural phenotype of proteins. It integrates aggregation prediction (TANGO), amyloid prediction (WALTZ), chaperone-binding prediction (LIMBO) and protein stability analysis (FoldX) for structural phenotyping. Additionally, SNPeffect holds information on affected catalytic sites and a number of post-translational modifications. The database contains all known human protein variants from UniProt, but users can now also submit custom protein variants for a SNPeffect analysis, including automated structure modeling. The new meta-analysis application allows plotting correlations between phenotypic features for a user-selected set of variants.
Collapse
|
135
|
|
136
|
Capriotti E, Altman RB. A new disease-specific machine learning approach for the prediction of cancer-causing missense variants. Genomics 2011; 98:310-7. [PMID: 21763417 DOI: 10.1016/j.ygeno.2011.06.010] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2011] [Revised: 06/26/2011] [Accepted: 06/28/2011] [Indexed: 12/20/2022]
Abstract
High-throughput genotyping and sequencing techniques are rapidly and inexpensively providing large amounts of human genetic variation data. Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability and have been implicated in several human diseases, including cancer. Amino acid mutations resulting from non-synonymous SNPs in coding regions may generate protein functional changes that affect cell proliferation. In this study, we developed a machine learning approach to predict cancer-causing missense variants. We present a Support Vector Machine (SVM) classifier trained on a set of 3163 cancer-causing variants and an equal number of neutral polymorphisms. The method achieve 93% overall accuracy, a correlation coefficient of 0.86, and area under ROC curve of 0.98. When compared with other previously developed algorithms such as SIFT and CHASM our method results in higher prediction accuracy and correlation coefficient in identifying cancer-causing variants.
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|