Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

18
(from Reference Citation Analysis)

Article PDFs (6)

Cited by > 0 (12)

Searched Name

Ibrahim Numanagić

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Zhou Q, Ghezelji M, Hari A, Ford MKB, Holley C, Mirabello L, Chanock S, Sahinalp SC, Numanagić I. Geny: A Genotyping Tool for Allelic Decomposition of Killer Cell Immunoglobulin-Like Receptor Genes. bioRxiv 2024:2024.02.27.582413. [PMID: 38529502 PMCID: PMC10962708 DOI: 10.1101/2024.02.27.582413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]

Shugg T, Ly RC, Osei W, Rowe EJ, Granfield CA, Lynnes TC, Medeiros EB, Hodge JC, Breman AM, Schneider BP, Sahinalp SC, Numanagić I, Salisbury BA, Bray SM, Ratcliff R, Skaar TC. Computational pharmacogenotype extraction from clinical next-generation sequencing. Front Oncol 2023;13:1199741. [PMID: 37469403 PMCID: PMC10352904 DOI: 10.3389/fonc.2023.1199741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 05/22/2023] [Indexed: 07/21/2023] Open

Abstract

Background

Next-generation sequencing (NGS), including whole genome sequencing (WGS) and whole exome sequencing (WES), is increasingly being used for clinic care. While NGS data have the potential to be repurposed to support clinical pharmacogenomics (PGx), current computational approaches have not been widely validated using clinical data. In this study, we assessed the accuracy of the Aldy computational method to extract PGx genotypes from WGS and WES data for 14 and 13 major pharmacogenes, respectively.

Methods

Germline DNA was isolated from whole blood samples collected for 264 patients seen at our institutional molecular solid tumor board. DNA was used for panel-based genotyping within our institutional Clinical Laboratory Improvement Amendments- (CLIA-) certified PGx laboratory. DNA was also sent to other CLIA-certified commercial laboratories for clinical WGS or WES. Aldy v3.3 and v4.4 were used to extract PGx genotypes from these NGS data, and results were compared to the panel-based genotyping reference standard that contained 45 star allele-defining variants within CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DPYD, G6PD, NUDT15, SLCO1B1, TPMT, and VKORC1.

Results

Mean WGS read depth was >30x for all variant regions except for G6PD (average read depth was 29 reads), and mean WES read depth was >30x for all variant regions. For 94 patients with WGS, Aldy v3.3 diplotype calls were concordant with those from the genotyping reference standard in 99.5% of cases when excluding diplotypes with additional major star alleles not tested by targeted genotyping, ambiguous phasing, and CYP2D6 hybrid alleles. Aldy v3.3 identified 15 additional clinically actionable star alleles not covered by genotyping within CYP2B6, CYP2C19, DPYD, SLCO1B1, and NUDT15. Within the WGS cohort, Aldy v4.4 diplotype calls were concordant with those from genotyping in 99.7% of cases. When excluding patients with CYP2D6 copy number variation, all Aldy v4.4 diplotype calls except for one CYP3A4 diplotype call were concordant with genotyping for 161 patients in the WES cohort.

Conclusion

Aldy v3.3 and v4.4 called diplotypes for major pharmacogenes from clinical WES and WGS data with >99% accuracy. These findings support the use of Aldy to repurpose clinical NGS data to inform clinical PGx.

Collapse

Affiliation(s)

Tyler Shugg Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States
Reynold C. Ly Division of Diagnostic Genetics and Genomics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
Wilberforce Osei Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States
Elizabeth J. Rowe Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States
Caitlin A. Granfield Division of Diagnostic Genetics and Genomics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
Ty C. Lynnes Division of Diagnostic Genetics and Genomics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
Elizabeth B. Medeiros Division of Diagnostic Genetics and Genomics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
Jennelle C. Hodge Division of Diagnostic Genetics and Genomics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
Amy M. Breman Division of Diagnostic Genetics and Genomics, Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, United States
Bryan P. Schneider Division of Hematology/Oncology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States
S. Cenk Sahinalp Center for Cancer Research, National Cancer Institute, National Institute of Health, Bethesda, MD, United States
Ibrahim Numanagić Department of Computer Science, University of Victoria, Victoria, BC, Canada
Benjamin A. Salisbury LifeOmic Inc., Indianapolis, IN, United States
Steven M. Bray LifeOmic Inc., Indianapolis, IN, United States
Ryan Ratcliff LifeOmic Inc., Indianapolis, IN, United States
Todd C. Skaar Division of Clinical Pharmacology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, United States

Collapse

Hari A, Zhou Q, Gonzaludo N, Harting J, Scott SA, Qin X, Scherer S, Sahinalp SC, Numanagić I. An efficient genotyper and star-allele caller for pharmacogenomics. Genome Res 2023;33:61-70. [PMID: 36657977 PMCID: PMC9977157 DOI: 10.1101/gr.277075.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 12/12/2022] [Indexed: 01/20/2023]

Osei WA, Shugg T, Ly RC, Bray SM, Salisbury BA, Ratcliff RR, Pratt VM, Numanagić I, Skaar T. Abstract 1151: Pharmacogenomics genotyping from clinical somatic whole exome sequencing: Aldy, a computational tool. Cancer Res 2022. [DOI: 10.1158/1538-7445.am2022-1151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Abstract Background Pharmacogenomics (PGx) testing can reduce toxicities and improve efficacy of several drugs used to treat cancer and associated symptoms. PGx results can be determined from germline whole-exome sequencing (WES), but somatic mutations may cause discordance between tumor and germline DNA. Since clinical diagnostic sequencing in oncology frequently only includes tumor DNA, there would be clinical value in calling germline PGx genotypes from tumor DNA. Thus, the purpose of this study was to assess the feasibility of using somatic WES data to call germline PGx genotypes. Methods Germline and somatic WES data were obtained as part of the clinical workflow for 64 patients treated at the solid molecular tumor board clinic at Indiana University. Aldy v3.3 was implemented in LifeOmic’s Precision Health Cloud™ to call PGx genotypes from somatic WES. Somatic Aldy calls were compared with previously validated Aldy germline calls for 8 genes: CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DPYD, and TPMT. Somatic read depth was >100x, except for the intronic CYP3A4*22 variant, which was >30x. Results Somatic and germline Aldy calls were compared for a total of 512 genotypes and 56 (11%) calls were discordant. Discordant calls were most common for CYP2B6 (23.4%), followed by CYP2D6 (14.1%), CYP2C19 (10.9%), CYP2C8 (6.3%), and DPYD (6.3%). In contrast, all Aldy calls were concordant for CYP3A5 and TPMT. 38 out of 64 subjects (59%) had discordant calls for at least one gene. The most common first cancer diagnoses in our cohort were colorectal (9.3%), breast (7.8%), and pancreatic (7.8%), and the rates of discordant Aldy calls did not differ by cancer type (p>0.05 for all cancer types). Based on our analyses of discordant calls, we anticipate that adjusting Aldy’s thresholds for variant calling may allow Aldy to determine genotypes from somatic WES data. Conclusion In most cases, genotype calls of drug metabolism genes from tumor DNA reflected the germline genotypes; however, additional work needs to be done to determine if the remaining discordant calls can be corrected by modifying the informatics tools or if they are due to somatic mutations. Citation Format: Wilberforce A. Osei, Tyler Shugg, Reynold C. Ly, Steven M. Bray, Benjamin A. Salisbury, Ryan R. Ratcliff, Victoria M. Pratt, Ibrahim Numanagić, Todd Skaar. Pharmacogenomics genotyping from clinical somatic whole exome sequencing: Aldy, a computational tool [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1151. Collapse

Smajlović H, Shajii A, Berger B, Cho H, Numanagić I. Sequre: a high-performance framework for rapid development of secure bioinformatics pipelines. IEEE Int Symp Parallel Distrib Process Workshops Phd Forum 2022;2022:164-165. [PMID: 35958356 PMCID: PMC9364365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Gaedigk A, Boone EC, Scherer SE, Lee SB, Numanagić I, Sahinalp C, Smith JD, McGee S, Radhakrishnan A, Qin X, Wang WY, Farrow EG, Gonzaludo N, Halpern AL, Nickerson DA, Miller NA, Pratt VM, Kalman LV. CYP2C8, CYP2C9, and CYP2C19 Characterization Using Next-Generation Sequencing and Haplotype Analysis: A GeT-RM Collaborative Project. J Mol Diagn 2022;24:337-350. [PMID: 35134542 PMCID: PMC9069873 DOI: 10.1016/j.jmoldx.2021.12.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/09/2021] [Accepted: 12/28/2021] [Indexed: 01/13/2023] Open

Affiliation(s)

Andrea Gaedigk Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri; University of Missouri-Kansas City School of Medicine, Kansas City, Missouri
Erin C Boone Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri
Steven E Scherer Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
Seung-Been Lee Precision Medicine Institute, Macrogen Inc., Seongnam, Republic of Korea
Ibrahim Numanagić Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
Cenk Sahinalp Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
Joshua D Smith Department of Genome Sciences, University of Washington, Seattle, Washington
Sean McGee Department of Genome Sciences, University of Washington, Seattle, Washington
Aparna Radhakrishnan Department of Genome Sciences, University of Washington, Seattle, Washington
Xiang Qin Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
Wendy Y Wang Division of Clinical Pharmacology, Toxicology and Therapeutic Innovation, Children's Mercy Kansas City, Kansas City, Missouri
Emily G Farrow University of Missouri-Kansas City School of Medicine, Kansas City, Missouri; Center for Genomic Medicine, Children's Mercy Kansas City, Kansas City, Missouri
Nina Gonzaludo Medical Genomics Research, Illumina Inc., San Diego, California
Aaron L Halpern Medical Genomics Research, Illumina Inc., San Diego, California
Deborah A Nickerson Department of Genome Sciences, University of Washington, Seattle, Washington
Neil A Miller University of Missouri-Kansas City School of Medicine, Kansas City, Missouri; Center for Genomic Medicine, Children's Mercy Kansas City, Kansas City, Missouri
Victoria M Pratt Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana
Lisa V Kalman Informatics and Data Science Branch, Division of Laboratory Systems, Centers for Disease Control and Prevention, Atlanta, Georgia.

Collapse

Išerić H, Alkan C, Hach F, Numanagić I. Fast characterization of segmental duplication structure in multiple genome assemblies. Algorithms Mol Biol 2022;17:4. [PMID: 35303886 PMCID: PMC8932185 DOI: 10.1186/s13015-022-00210-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/08/2022] [Indexed: 11/29/2022] Open

Abstract

Motivation

The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today.

Results

Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7–33\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years.

Availability and implementation

BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser.

Collapse

Ly R, Shugg T, Ratcliff R, Osei W, Pratt V, Schneider B, Radovich M, Bray S, Salisbury B, Parikh B, Sahinalp SC, Numanagić I, Skaar T. eP373: Analytical validation of a computational method for pharmacogenetic genotyping from clinical exome sequencing. Genet Med 2022. [DOI: 10.1016/j.gim.2022.01.408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Shajii A, Numanagić I, Leighton AT, Greenyer H, Amarasinghe S, Berger B. A Python-based programming language for high-performance computational genomics. Nat Biotechnol 2021;39:1062-1064. [PMID: 34282326 PMCID: PMC8542382 DOI: 10.1038/s41587-021-00985-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Berger E, Yorukoglu D, Zhang L, Nyquist SK, Shalek AK, Kellis M, Numanagić I, Berger B. Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets. Nat Commun 2020;11:4662. [PMID: 32938926 PMCID: PMC7494856 DOI: 10.1038/s41467-020-18320-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 08/07/2020] [Indexed: 01/04/2023] Open

Shajii A, Numanagić I, Baghdadi R, Berger B, Amarasinghe S. Seq: A High-Performance Language for Bioinformatics. Proc ACM Program Lang 2019;3:125. [PMID: 35775031 PMCID: PMC9241673 DOI: 10.1145/3360551] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Abstract

The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10⁶-and the amount of data to be analyzed has increased proportionally. Yet, as Moore's Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines. Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python-and is in many cases a drop-in replacement-yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, k-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.

Collapse

Numanagić I, Gökkaya AS, Zhang L, Berger B, Alkan C, Hach F. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 2018;34:i706-i714. [PMID: 30423092 PMCID: PMC6129265 DOI: 10.1093/bioinformatics/bty586] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Abstract

Motivation

Segmental duplications (SDs) or low-copy repeats, are segments of DNA > 1 Kbp with high sequence identity that are copied to other regions of the genome. SDs are among the most important sources of evolution, a common cause of genomic structural variation and several are associated with diseases of genomic origin including schizophrenia and autism. Despite their functional importance, SDs present one of the major hurdles for de novo genome assembly due to the ambiguity they cause in building and traversing both state-of-the-art overlap-layout-consensus and de Bruijn graphs. This causes SD regions to be misassembled, collapsed into a unique representation, or completely missing from assembled reference genomes for various organisms. In turn, this missing or incorrect information limits our ability to fully understand the evolution and the architecture of the genomes. Despite the essential need to accurately characterize SDs in assemblies, there has been only one tool that was developed for this purpose, called Whole-Genome Assembly Comparison (WGAC); its primary goal is SD detection. WGAC is comprised of several steps that employ different tools and custom scripts, which makes this strategy difficult and time consuming to use. Thus there is still a need for algorithms to characterize within-assembly SDs quickly, accurately, and in a user friendly manner.

Results

Here we introduce SEgmental Duplication Evaluation Framework (SEDEF) to rapidly detect SDs through sophisticated filtering strategies based on Jaccard similarity and local chaining. We show that SEDEF accurately detects SDs while maintaining substantial speed up over WGAC that translates into practical run times of minutes instead of weeks. Notably, our algorithm captures up to 25% 'pairwise error' between segments, whereas previous studies focused on only 10%, allowing us to more deeply track the evolutionary history of the genome.

Availability and implementation

SEDEF is available at https://github.com/vpc-ccg/sedef.

Collapse

Lin YY, Gawronski A, Hach F, Li S, Numanagić I, Sarrafi I, Mishra S, McPherson A, Collins CC, Radovich M, Tang H, Sahinalp SC. Computational identification of micro-structural variations and their proteogenomic consequences in cancer. Bioinformatics 2018;34:1672-1681. [PMID: 29267878 PMCID: PMC5946953 DOI: 10.1093/bioinformatics/btx807] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 11/24/2017] [Accepted: 12/15/2017] [Indexed: 12/18/2022] Open

Abstract

Motivation

Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples.

Results

We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure.

Availability and implementation

MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie.

Contact

cenksahi@indiana.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Shajii A, Numanagić I, Berger B. Latent Variable Model for Aligning Barcoded Short-Reads Improves Downstream Analyses. Res Comput Mol Biol 2018;10812:280-282. [PMID: 29888346 PMCID: PMC5989713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Numanagić I, Malikić S, Ford M, Qin X, Toji L, Radovich M, Skaar TC, Pratt VM, Berger B, Scherer S, Sahinalp SC. Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes. Nat Commun 2018;9:828. [PMID: 29483503 PMCID: PMC5826927 DOI: 10.1038/s41467-018-03273-1] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 02/01/2018] [Indexed: 12/30/2022] Open

Kavak P, Lin YY, Numanagić I, Asghari H, Güngör T, Alkan C, Hach F. Discovery and genotyping of novel sequence insertions in many sequenced individuals. Bioinformatics 2017;33:i161-i169. [PMID: 28881988 PMCID: PMC5870608 DOI: 10.1093/bioinformatics/btx254] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Numanagić I, Malikić S, Pratt VM, Skaar TC, Flockhart DA, Sahinalp SC. Cypiripi: exact genotyping of CYP2D6 using high-throughput sequencing data. Bioinformatics 2015;31:i27-34. [PMID: 26072492 PMCID: PMC4542776 DOI: 10.1093/bioinformatics/btv232] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Affiliation(s)

Ibrahim Numanagić School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA
Salem Malikić School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA
Victoria M Pratt School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA
Todd C Skaar School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA
David A Flockhart School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA
S Cenk Sahinalp School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Department of Medicine, Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA and School of Informatics and Computing, Indiana University, Bloomington, IN 47401, USA

Collapse

Dao P, Numanagić I, Lin YY, Hach F, Karakoc E, Donmez N, Collins C, Eichler EE, Sahinalp SC. ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms. ACTA ACUST UNITED AC 2013;30:644-51. [PMID: 24130305 DOI: 10.1093/bioinformatics/btt591] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]