1
|
Piergentili R, Basile G, Nocella C, Carnevale R, Marinelli E, Patrone R, Zaami S. Using ncRNAs as Tools in Cancer Diagnosis and Treatment-The Way towards Personalized Medicine to Improve Patients' Health. Int J Mol Sci 2022; 23:9353. [PMID: 36012617 PMCID: PMC9409241 DOI: 10.3390/ijms23169353] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/14/2022] [Accepted: 08/16/2022] [Indexed: 12/06/2022] Open
Abstract
Although the first discovery of a non-coding RNA (ncRNA) dates back to 1958, only in recent years has the complexity of the transcriptome started to be elucidated. However, its components are still under investigation and their identification is one of the challenges that scientists are presently facing. In addition, their function is still far from being fully understood. The non-coding portion of the genome is indeed the largest, both quantitatively and qualitatively. A large fraction of these ncRNAs have a regulatory role either in coding mRNAs or in other ncRNAs, creating an intracellular network of crossed interactions (competing endogenous RNA networks, or ceRNET) that fine-tune the gene expression in both health and disease. The alteration of the equilibrium among such interactions can be enough to cause a transition from health to disease, but the opposite is equally true, leading to the possibility of intervening based on these mechanisms to cure human conditions. In this review, we summarize the present knowledge on these mechanisms, illustrating how they can be used for disease treatment, the current challenges and pitfalls, and the roles of environmental and lifestyle-related contributing factors, in addition to the ethical, legal, and social issues arising from their (improper) use.
Collapse
Affiliation(s)
- Roberto Piergentili
- Institute of Molecular Biology and Pathology, Italian National Research Council (CNR-IBPM), 00185 Rome, Italy
| | - Giuseppe Basile
- Trauma Unit and Emergency Department, IRCCS Galeazzi Orthopedics Institute, 20161 Milan, Italy
- Head of Legal Medicine Unit, Clinical Institute San Siro, 20148 Milan, Italy
| | - Cristina Nocella
- Department of Clinical Internal, Anaesthesiological and Cardiovascular Sciences, “Sapienza” University of Rome, Viale del Policlinico, 155, 00161 Rome, Italy
| | - Roberto Carnevale
- Department of Medico-Surgical Sciences and Biotechnologies, “Sapienza” University of Rome, 04100 Latina, Italy
- Mediterranea Cardiocentro-Napoli, Via Orazio, 80122 Naples, Italy
| | - Enrico Marinelli
- Department of Medico-Surgical Sciences and Biotechnologies, “Sapienza” University of Rome, 04100 Latina, Italy
| | - Renato Patrone
- PhD ICTH, University of Federico II, HPB Department INT F. Pascale IRCCS of Naples, Via Mariano Semmola, 80131 Naples, Italy
| | - Simona Zaami
- Department of Anatomical, Histological, Forensic and Orthopedic Sciences, Section of Forensic Medicine, “Sapienza” University of Rome, 00161 Rome, Italy
| |
Collapse
|
2
|
Park YS, Kim S, Park DG, Kim DH, Yoon KW, Shin W, Han K. Comparison of library construction kits for mRNA sequencing in the Illumina platform. Genes Genomics 2019; 41:1233-1240. [PMID: 31350733 DOI: 10.1007/s13258-019-00853-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 07/15/2019] [Indexed: 12/28/2022]
Abstract
BACKGROUND The emergence of next-generation sequencing (NGS) technologies has made a tremendous contribution to the deciphering and significance of transcriptome analysis in biological fields. Since the advent of NGS technology in 2007, Illumina, Inc. has provided one of the most widely used sequencing platforms for NGS analysis. OBJECTIVE Although reagents and protocols provided by Illumina are adequately performed in transcriptome sequencing, recently, alternative reagents and protocols which are relatively cost effective are accessible. However, the kits derived from various manufacturers have advantages and disadvantages when researchers carry out the transcriptome library construction. METHODS We compared them using a variety of protocols to produce Illumina-compatible libraries based on transcriptome. Three different mRNA sequencing kits were selected for this study: TruSeq® RNA Sample Preparation V2 (Illumina, Inc., USA), Universal Plus mRNA-Seq (NuGEN, Ltd., UK), and NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® (New England BioLabs, Ltd., USA). We compared them focusing on cost, experimental time, and data output. RESULTS The quality and quantity of sequencing data obtained through the NGS technique were strongly influenced by the type of the sequencing library kits. It suggests that for transcriptome studies, researchers should select a suitable library construction kit according to the goal and resources of experiments. CONCLUSION The present work will help researchers to choose the right sequencing library construction kit for transcriptome analyses.
Collapse
Affiliation(s)
- Yong-Soo Park
- Department of Equine Industry, Korea National College of Agriculture and Fisheries, Jeonju, 54874, Republic of Korea
| | - Songmi Kim
- Department of Nanobiomedical Science and BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, 31116, Republic of Korea
- Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan, 31116, Republic of Korea
| | - Dong-Guk Park
- Department of Surgery, Dankook University College of Medicine, Cheonan, 31116, Republic of Korea
| | - Dong Hee Kim
- Department of Anesthesiology and Pain Management, Dankook University College of Medicine, Cheonan, 31116, Republic of Korea
| | - Kyeong-Wook Yoon
- Department of Neurosurgery, Dankook University College of Medicine, Cheonan, 31116, Republic of Korea
| | - Wonseok Shin
- Department of Nanobiomedical Science and BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, 31116, Republic of Korea.
- Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan, 31116, Republic of Korea.
| | - Kyudong Han
- Department of Nanobiomedical Science and BK21 PLUS NBM Global Research Center for Regenerative Medicine, Dankook University, Cheonan, 31116, Republic of Korea.
- Center for Bio-Medical Engineering Core Facility, Dankook University, Cheonan, 31116, Republic of Korea.
| |
Collapse
|
3
|
Pan B, Kusko R, Xiao W, Zheng Y, Liu Z, Xiao C, Sakkiah S, Guo W, Gong P, Zhang C, Ge W, Shi L, Tong W, Hong H. Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinformatics 2019; 20:101. [PMID: 30871461 PMCID: PMC6419332 DOI: 10.1186/s12859-019-2620-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. Methods We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. Results The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). Conclusion A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. Electronic supplementary material The online version of this article (10.1186/s12859-019-2620-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bohu Pan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | - Wenming Xiao
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Yuanting Zheng
- Center for Pharmacogenomics, Fudan University, Shanghai, China
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Chunlin Xiao
- National Center for Biotechnological Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Sugunadevi Sakkiah
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenjing Guo
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ping Gong
- Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA
| | - Chaoyang Zhang
- School of Computing, The University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Leming Shi
- Center for Pharmacogenomics, Fudan University, Shanghai, China
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
4
|
Dong L, Wu N, Wang S, Cheng Y, Han L, Zhao J, Long X, Mu K, Li M, Wei L, Wang W, Zhang W, Cao Y, Liu J, Yu J, Hao X. Detection of novel germline mutations in six breast cancer predisposition genes by targeted next-generation sequencing. Hum Mutat 2018; 39:1442-1455. [PMID: 30039884 DOI: 10.1002/humu.23597] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 07/13/2018] [Accepted: 07/18/2018] [Indexed: 11/10/2022]
Abstract
In this study, a customized amplicon-based target sequencing panel was designed to enrich the whole exon regions of six genes associated with the risk of breast cancer. Targeted next-generation sequencing (NGS) was performed for 146 breast cancer patients (BC), 71 healthy women with a family history of breast cancer (high risk), and 55 healthy women without a family history of cancer (control). Sixteen possible disease-causing mutations on four genes were identified in 20 samples. The percentages of possible disease-causing mutation carriers in the BC group (8.9%) and in the high-risk group (8.5%) were higher than that in the control group (1.8%). The BRCA1 possible disease-causing mutation group had a higher prevalence in family history and triple-negative breast cancer, while the BRCA2 possible disease-causing mutation group was younger and more likely to develop axillary lymph node metastasis (P < 0.05). Among the 146 patients, 47 with a family history of breast cancer were also sequenced with another 14 moderate-risk genes. Three additional possible disease-causing mutations were found on PALB2, CHEK2, and PMS2 genes, respectively. The results demonstrate that the six-gene targeted NGS panel may provide an approach to assess the genetic risk of breast cancer and predict the clinical prognosis of breast cancer patients.
Collapse
Affiliation(s)
- Li Dong
- Cancer Molecular Diagnostics Core, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer , Tianjin, China
| | - Nan Wu
- Cancer Prevention Center, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | | | - Yanan Cheng
- Cancer Molecular Diagnostics Core, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer , Tianjin, China
| | - Lei Han
- Cancer Molecular Diagnostics Core, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer , Tianjin, China
| | - Jing Zhao
- The Second Department of Breast Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Xinxin Long
- Department of Oncology, Tengzhou Central People's Hospital, Tengzhou, P.R. China
| | - Kun Mu
- Department of Breast Surgery, Hebei Province Cangzhou Hospital of Integrated Traditional and Western Medicine (Cangzhou No. 2 Hospital), Cangzhou, P. R. China
| | - Menghui Li
- Cancer Prevention Center, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Lijuan Wei
- Cancer Prevention Center, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | | | - Weijia Zhang
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Yandong Cao
- Analyses Technology Co. Ltd., Beijing, China
| | - Juntian Liu
- Cancer Prevention Center, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China.,The Second Department of Breast Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Jinpu Yu
- Cancer Molecular Diagnostics Core, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer , Tianjin, China
| | - Xishan Hao
- Cancer Molecular Diagnostics Core, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer , Tianjin, China
| |
Collapse
|
5
|
D'Souza M, Sulakhe D, Wang S, Xie B, Hashemifar S, Taylor A, Dubchak I, Conrad Gilliam T, Maltsev N. Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks. Methods Mol Biol 2017; 1613:85-99. [PMID: 28849559 DOI: 10.1007/978-1-4939-7027-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Collapse
Affiliation(s)
- Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA.
- Argonne National Laboratory, Building 221, Room: A142, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Bing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
| | - Somaye Hashemifar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
| | - Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| |
Collapse
|
6
|
Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J Genet 2016; 94:731-40. [PMID: 26690529 DOI: 10.1007/s12041-015-0588-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low disc ordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.
Collapse
|
7
|
Hong H, Slikker W. Advancing translation of biomarkers into regulatory science. Biomark Med 2016; 9:1043-6. [PMID: 26573514 DOI: 10.2217/bmm.15.104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Huixiao Hong
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - William Slikker
- Office of the Director, National Center for Toxicological Research, US Food & Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA
| |
Collapse
|
8
|
Genomic Discoveries and Personalized Medicine in Neurological Diseases. Pharmaceutics 2015; 7:542-53. [PMID: 26690205 PMCID: PMC4695833 DOI: 10.3390/pharmaceutics7040542] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 11/30/2015] [Accepted: 12/02/2015] [Indexed: 12/22/2022] Open
Abstract
In the past decades, we have witnessed dramatic changes in clinical diagnoses and treatments due to the revolutions of genomics and personalized medicine. Undoubtedly we also met many challenges when we use those advanced technologies in drug discovery and development. In this review, we describe when genomic information is applied in personal healthcare in general. We illustrate some case examples of genomic discoveries and promising personalized medicine applications in the area of neurological disease particular. Available data suggest that individual genomics can be applied to better treat patients in the near future.
Collapse
|
9
|
Ye H, Meehan J, Tong W, Hong H. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine. Pharmaceutics 2015; 7:523-41. [PMID: 26610555 PMCID: PMC4695832 DOI: 10.3390/pharmaceutics7040523] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 11/14/2015] [Accepted: 11/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.
Collapse
Affiliation(s)
- Hao Ye
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Joe Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| |
Collapse
|
10
|
SHEN TONY, LEE ARIEL, SHEN CAROL, LIN C. The long tail and rare disease research: the impact of next-generation sequencing for rare Mendelian disorders. Genet Res (Camb) 2015; 97:e15. [PMID: 26365496 PMCID: PMC6863629 DOI: 10.1017/s0016672315000166] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2014] [Revised: 06/25/2015] [Accepted: 06/29/2015] [Indexed: 12/11/2022] Open
Abstract
There are an estimated 6000-8000 rare Mendelian diseases that collectively affect 30 million individuals in the United States. The low incidence and prevalence of these diseases present significant challenges to improving diagnostics and treatments. Next-generation sequencing (NGS) technologies have revolutionized research of rare diseases. This article will first comment on the effectiveness of NGS through the lens of long-tailed economics. We then provide an overview of recent developments and challenges of NGS-based research on rare diseases. As the quality of NGS studies improve and the cost of sequencing decreases, NGS will continue to make a significant impact on the study of rare diseases moving forward.
Collapse
Affiliation(s)
- TONY SHEN
- Rare Genomics Institute, 5225 Pooks Hills Road, Suite 1701N, Bethesda, MD 20814, USA
- Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, USA
| | - ARIEL LEE
- Rare Genomics Institute, 5225 Pooks Hills Road, Suite 1701N, Bethesda, MD 20814, USA
- Nova Southeastern University, College of Osteopathic Medicine, 3301 College Avenue, Ft. Lauderdale, FL 333314-796, USA
| | - CAROL SHEN
- Rare Genomics Institute, 5225 Pooks Hills Road, Suite 1701N, Bethesda, MD 20814, USA
- Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, USA
| | - C.JIMMY LIN
- Rare Genomics Institute, 5225 Pooks Hills Road, Suite 1701N, Bethesda, MD 20814, USA
| |
Collapse
|
11
|
Md. SSG, Diego-Álvarez D, Buades C, Romera-López A, Pérez-Cabornero L, Valero-Hervás D, Cantalapiedra D, Bioinformatics, Felipe-Ponce V, Hernández-Poveda G, José Roca M, Casañs C, Fernández-Pedrosa V, M. CC, C. ÁA, P. JCT, C. ÓR, Marco G, Gil M, Miñambres R, Ballester A. DIAGNÓSTICO MOLECULAR DE ENFERMEDADES GENÉTICAS: DEL DIAGNÓSTICO GENÉTICO AL DIAGNÓSTICO GENÓMICO CON LA SECUENCIACIÓN MASIVA. REVISTA MÉDICA CLÍNICA LAS CONDES 2015. [DOI: 10.1016/j.rmclc.2015.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
12
|
Dubchak I, Balasubramanian S, Wang S, Meyden C, Sulakhe D, Poliakov A, Börnigen D, Xie B, Taylor A, Ma J, Paciorkowski AR, Mirzaa GM, Dave P, Agam G, Xu J, Al-Gazali L, Mason CE, Ross ME, Maltsev N, Gilliam TC. An integrative computational approach for prioritization of genomic variants. PLoS One 2014; 9:e114903. [PMID: 25506935 PMCID: PMC4266634 DOI: 10.1371/journal.pone.0114903] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 11/15/2014] [Indexed: 12/27/2022] Open
Abstract
An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidate genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. The study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.
Collapse
Affiliation(s)
- Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
- * E-mail: (ID); (NM)
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Cem Meyden
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, United States of America
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America
- Feil Family Brain and Mind Research Institute, Weill Cornell Medical College, New York, New York, United States of America
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
| | - Alexander Poliakov
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Daniela Börnigen
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois, United States of America
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Alex R. Paciorkowski
- Departments of Neurology, Pediatrics, and Biomedical Genetics and Center for Neural Development and Disease, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Ghayda M. Mirzaa
- Seattle Children's Research Institute and Department of Pediatrics, University of Washington, Seattle, Washington, United States of America
| | - Paul Dave
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, Illinois, United States of America
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Lihadh Al-Gazali
- Department of Pediatrics, Faculty of Medicine and Health Sciences, United Arab Emirates University, Al-Ain, UAE
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, United States of America
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, United States of America
- Feil Family Brain and Mind Research Institute, Weill Cornell Medical College, New York, New York, United States of America
| | - M. Elizabeth Ross
- Laboratory of Neurogenetics and Development, Weill Cornell Medical College, New York, New York, United States of America
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
- * E-mail: (ID); (NM)
| | - T. Conrad Gilliam
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Computation Institute, University of Chicago/Argonne National Laboratory, Chicago, Illinois, United States of America
| |
Collapse
|
13
|
Zhang W, Soika V, Meehan J, Su Z, Ge W, Ng HW, Perkins R, Simonyan V, Tong W, Hong H. Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing. THE PHARMACOGENOMICS JOURNAL 2014; 15:298-309. [PMID: 25384574 DOI: 10.1038/tpj.2014.70] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 07/16/2014] [Accepted: 09/19/2014] [Indexed: 12/18/2022]
Abstract
Although many quality control (QC) methods have been developed to improve the quality of single-nucleotide variants (SNVs) in SNV-calling, QC methods for use subsequent to single-nucleotide polymorphism-calling have not been reported. We developed five QC metrics to improve the quality of SNVs using the whole-genome-sequencing data of a monozygotic twin pair from the Korean Personal Genome Project. The QC metrics improved both repeatability between the monozygotic twin pair and reproducibility between SNV-calling pipelines. We demonstrated the QC metrics improve reproducibility of SNVs derived from not only whole-genome-sequencing data but also whole-exome-sequencing data. The QC metrics are calculated based on the reference genome used in the alignment without accessing the raw and intermediate data or knowing the SNV-calling details. Therefore, the QC metrics can be easily adopted in downstream association analysis.
Collapse
Affiliation(s)
- W Zhang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - V Soika
- Office of The Center Director, Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, MD, USA
| | - J Meehan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Z Su
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - W Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - H W Ng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - R Perkins
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - V Simonyan
- Office of The Center Director, Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, MD, USA
| | - W Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - H Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| |
Collapse
|
14
|
Zhang W, Meehan J, Su Z, Ng HW, Shu M, Luo H, Ge W, Perkins R, Tong W, Hong H. Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population. BMC Bioinformatics 2014; 15 Suppl 11:S6. [PMID: 25350283 PMCID: PMC4251052 DOI: 10.1186/1471-2105-15-s11-s6] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Background Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. Methods In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. Results We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. Conclusion We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
Collapse
|
15
|
Brazas MD, Lewitter F, Schneider MV, van Gelder CWG, Palagi PM. A quick guide to genomics and bioinformatics training for clinical and public audiences. PLoS Comput Biol 2014; 10:e1003510. [PMID: 24722068 PMCID: PMC3983038 DOI: 10.1371/journal.pcbi.1003510] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
| | - Fran Lewitter
- Whitehead Institute for Biomedical Research, Massachusetts Institute of Technology, Cambridge, United States of America
| | | | - Celia W. G. van Gelder
- Netherlands Bioinformatics Centre and Department of Bioinformatics, Radboud Medical Center, Nijmegen, The Netherlands
| | | |
Collapse
|
16
|
Caboche S, Audebert C, Hot D. High-Throughput Sequencing, a VersatileWeapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology. Pathogens 2014; 3:258-79. [PMID: 25437800 PMCID: PMC4243446 DOI: 10.3390/pathogens3020258] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/28/2014] [Accepted: 03/20/2014] [Indexed: 12/19/2022] Open
Abstract
The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose.
Collapse
Affiliation(s)
- Ségolène Caboche
- FRE 3642 Molecular and Cellular Medecine, CNRS, Institut Pasteur de Lille and University Lille Nord de France, Lille 59019, France.
| | | | - David Hot
- FRE 3642 Molecular and Cellular Medecine, CNRS, Institut Pasteur de Lille and University Lille Nord de France, Lille 59019, France.
| |
Collapse
|
17
|
Abstract
The bioinformatics requirements within the clinical environment are very specific, and analytic techniques need to be fit for purpose, robust, and predictable. At the same time, the bewildering amount of information produced during these analyses needs to be carefully managed, used and interpreted correctly. The challenge for clinical laboratories now is to implement production analytical processes that are capable of handling different experimental approaches on current equipment, as well as to incorporate ways for these systems to evolve to take account of developments likely to make impacts in the near future. This is complicated by the many options available at each of the critical processing steps and a clear method needs to be developed to assemble appropriate pipelines. Here, I discuss the issues relevant to the development of an informatics pipeline that meets these criteria that should allow individual laboratories to assess their proposed strategies.
Collapse
Affiliation(s)
- Richard James Nigel Allcock
- School of Pathology and Laboratory Medicine, University of Western Australia, M574 Stirling Highway, Nedlands, WA, 6009, Australia,
| |
Collapse
|
18
|
Valdés A, Ibáñez C, Simó C, García-Cañas V. Recent transcriptomics advances and emerging applications in food science. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.06.014] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
19
|
Chen G, Shi T. Next-generation sequencing technologies for personalized medicine: promising but challenging. SCIENCE CHINA-LIFE SCIENCES 2013; 56:101-3. [PMID: 23393024 DOI: 10.1007/s11427-013-4436-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2013] [Indexed: 11/28/2022]
|