Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM Trans Comput Biol Bioinform 2021;18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

For:	Yuan X, Yu J, Xi J, Yang L, Shang J, Li Z, Duan J. CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data. IEEE/ACM Trans Comput Biol Bioinform 2021;18:539-549. [PMID: 31180897 DOI: 10.1109/tcbb.2019.2920889] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Number

Cited by Other Article(s)

Xie K, Ge X, Alvi HAK, Liu K, Song J, Yu Q. OTSUCNV: an adaptive segmentation and OTSU-based anomaly classification method for CNV detection using NGS data. BMC Genomics 2024;25:126. [PMID: 38291375 PMCID: PMC10826217 DOI: 10.1186/s12864-024-10018-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 01/15/2024] [Indexed: 02/01/2024] Open

Zhang Y, Liu W, Duan J. On the core segmentation algorithms of copy number variation detection tools. Brief Bioinform 2024;25:bbae022. [PMID: 38340093 PMCID: PMC10858679 DOI: 10.1093/bib/bbae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/26/2023] [Indexed: 02/12/2024] Open

Cabello-Aguilar S, Vendrell JA, Solassol J. A Bioinformatics Toolkit for Next-Generation Sequencing in Clinical Oncology. Curr Issues Mol Biol 2023;45:9737-9752. [PMID: 38132454 PMCID: PMC10741970 DOI: 10.3390/cimb45120608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 11/28/2023] [Accepted: 12/02/2023] [Indexed: 12/23/2023] Open

Li C, Fan S, Zhao H, Liu X. CNV-FB: A Feature bagging strategy-based approach to detect copy number variants from NGS data. J Bioinform Comput Biol 2023;21:2350026. [PMID: 38212874 DOI: 10.1142/s0219720023500269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]

Adaptive Savitzky–Golay Filters for Analysis of Copy Number Variation Peaks from Whole-Exome Sequencing Data. INFORMATION 2023. [DOI: 10.3390/info14020128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open

Abstract Copy number variation (CNV) is a form of structural variation in the human genome that provides medical insight into complex human diseases; while whole-genome sequencing is becoming more affordable, whole-exome sequencing (WES) remains an important tool in clinical diagnostics. Because of its discontinuous nature and unique characteristics of sparse target-enrichment-based WES data, the analysis and detection of CNV peaks remain difficult tasks. The Savitzky–Golay (SG) smoothing is well known as a fast and efficient smoothing method. However, no study has documented the use of this technique for CNV peak detection. It is well known that the effectiveness of the classical SG filter depends on the proper selection of the window length and polynomial degree, which should correspond with the scale of the peak because, in the case of peaks with a high rate of change, the effectiveness of the filter could be restricted. Based on the Savitzky–Golay algorithm, this paper introduces a novel adaptive method to smooth irregular peak distributions. The proposed method ensures high-precision noise reduction by dynamically modifying the results of the prior smoothing to automatically adjust parameters. Our method offers an additional feature extraction technique based on density and Euclidean distance. In comparison to classical Savitzky–Golay filtering and other peer filtering methods, the performance evaluation demonstrates that adaptive Savitzky–Golay filtering performs better. According to experimental results, our method effectively detects CNV peaks across all genomic segments for both short and long tags, with minimal peak height fidelity values (i.e., low estimation bias). As a result, we clearly demonstrate how well the adaptive Savitzky–Golay filtering method works and how its use in the detection of CNV peaks can complement the existing techniques used in CNV peak analysis. Collapse

Liu G, Yang H, Yuan X. A shortest path-based approach for copy number variation detection from next-generation sequencing data. Front Genet 2023;13:1084974. [PMID: 36733945 PMCID: PMC9887524 DOI: 10.3389/fgene.2022.1084974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/18/2023] Open

Abstract

Copy number variation (CNV) is one of the main structural variations in the human genome and accounts for a considerable proportion of variations. As CNVs can directly or indirectly cause cancer, mental illness, and genetic disease in humans, their effective detection in humans is of great interest in the fields of oncogene discovery, clinical decision-making, bioinformatics, and drug discovery. The advent of next-generation sequencing data makes CNV detection possible, and a large number of CNV detection tools are based on next-generation sequencing data. Due to the complexity (e.g., bias, noise, alignment errors) of next-generation sequencing data and CNV structures, the accuracy of existing methods in detecting CNVs remains low. In this work, we design a new CNV detection approach, called shortest path-based Copy number variation (SPCNV), to improve the detection accuracy of CNVs. SPCNV calculates the k nearest neighbors of each read depth and defines the shortest path, shortest path relation, and shortest path cost sets based on which further calculates the mean shortest path cost of each read depth and its k nearest neighbors. We utilize the ratio between the mean shortest path cost for each read depth and the mean of the mean shortest path cost of its k nearest neighbors to construct a relative shortest path score formula that is able to determine a score for each read depth. Based on the score profile, a boxplot is then applied to predict CNVs. The performance of the proposed method is verified by simulation data experiments and compared against several popular methods of the same type. Experimental results show that the proposed method achieves the best balance between recall and precision in each set of simulated samples. To further verify the performance of the proposed method in real application scenarios, we then select real sample data from the 1,000 Genomes Project to conduct experiments. The proposed method achieves the best F1-scores in almost all samples. Therefore, the proposed method can be used as a more reliable tool for the routine detection of CNVs.

Collapse

Zhang T, Dong J, Jiang H, Zhao Z, Zhou M, Yuan T. CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data. Front Bioeng Biotechnol 2022;10:1000638. [DOI: 10.3389/fbioe.2022.1000638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open

WAVECNV: A New Approach for Detecting Copy Number Variation by Wavelet Clustering. MATHEMATICS 2022. [DOI: 10.3390/math10122151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Wang X, Junqing L, Huang T. CNVABNN: An AdaBoost algorithm and neural networks-based detection of copy number variations from NGS data. Comput Biol Chem 2022;99:107720. [DOI: 10.1016/j.compbiolchem.2022.107720] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 06/22/2022] [Accepted: 06/23/2022] [Indexed: 11/03/2022]

svBreak: A New Approach for the Detection of Structural Variant Breakpoints Based on Convolutional Neural Network. BIOMED RESEARCH INTERNATIONAL 2022;2022:7196040. [PMID: 35345526 PMCID: PMC8957449 DOI: 10.1155/2022/7196040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 01/04/2022] [Accepted: 01/27/2022] [Indexed: 12/01/2022]

Agaoglu NB, Unal B, Akgun Dogan O, Zolfagharian P, Shairfli P, Karakurt A, Can Senay B, Kizilboga T, Yildiz J, Dinler Doganay G, Doganay L. Determining the Accuracy of Next Generation Sequencing Based Copy Number Variation Analysis in Hereditary Breast and Ovarian Cancer. Expert Rev Mol Diagn 2022;22:239-246. [PMID: 35240897 DOI: 10.1080/14737159.2022.2048373] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Affiliation(s)

Nihat Bugra Agaoglu Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.,Department of Medical Genetics, Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Busra Unal Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Ozlem Akgun Dogan Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.,Department of Pediatric Genetics, Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Payam Zolfagharian Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Pari Shairfli Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Aylin Karakurt Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Burak Can Senay Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey
Tugba Kizilboga Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.,Department of Molecular Biology and Genetics, Istanbul Technical University, Istanbul, Turkey
Jale Yildiz Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.,Department of Molecular Biology and Genetics, Istanbul Technical University, Istanbul, Turkey
Gizem Dinler Doganay Department of Molecular Biology and Genetics, Istanbul Technical University, Istanbul, Turkey
Levent Doganay Genomic Laboratory (GLAB), Umraniye Training and Research Hospital, University of Health Sciences, Istanbul, Turkey

Collapse

Obtaining spatially resolved tumor purity maps using deep multiple instance learning in a pan-cancer study. PATTERNS (NEW YORK, N.Y.) 2022;3:100399. [PMID: 35199060 PMCID: PMC8848022 DOI: 10.1016/j.patter.2021.100399] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/07/2021] [Accepted: 11/03/2021] [Indexed: 02/07/2023]

Abstract

Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment.

•

MIL model successfully predicts a sample's tumor purity from histopathology slides

•

MIL model learns to spatially resolve tumor purity from sample-level labels

•

Tumor purity varies spatially within a sample

•

Pathologists’ region selection is vital for correct percentage tumor nuclei estimation

Given some big data and coarse-level labels, extracting fine-level information is a demanding yet rewarding challenge in data science. This study develops a machine learning model utilizing big data and exploiting coarse-level labels to reveal fine-level details within the data. Although it can be applied to different data science tasks with enormous data and coarse labels, we applied it to a computational histopathology task with gigapixel histopathology slides and sample-level labels. Specifically, the model revealed spatial resolution of tumor purity within histopathology slides using only sample-level genomic tumor purity values during training. This can also be extended to other omics features, providing precious information about cancer biology and promising personalized, precision medicine. Such studies are of great clinical importance in discovering imaging biomarkers and better understanding the tumor microenvironment.

Collapse

Lee WP, Zhu Q, Yang X, Liu S, Cerveira E, Ryan M, Mil-Homens A, Bellfy L, Ye K, Lee C, Zhang C. JAX-CNV: A Whole-genome Sequencing-based Algorithm for Copy Number Detection at Clinical Grade Level. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022;20:1197-1206. [PMID: 35085778 DOI: 10.1016/j.gpb.2021.06.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 04/30/2021] [Accepted: 09/06/2021] [Indexed: 10/19/2022]

Affiliation(s)

Wan-Ping Lee Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; School of Cyber Science and Engineering, Xi'an Jiaotong University, Xi'an 710049, China; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Qihui Zhu The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Xiaofei Yang The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China; MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
Silvia Liu The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Eliza Cerveira The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Mallory Ryan The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Adam Mil-Homens The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Lauren Bellfy The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Kai Ye Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China; MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
Charles Lee Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Department of Life Sciences, Ewha Womans University, Seoul 03760, South Korea
Chengsheng Zhang Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an 710061, China; The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.

Collapse

Xie K, Liu K, Alvi HAK, Chen Y, Wang S, Yuan X. KNNCNV: A K-Nearest Neighbor Based Method for Detection of Copy Number Variations Using NGS Data. Front Cell Dev Biol 2022;9:796249. [PMID: 35004691 PMCID: PMC8728060 DOI: 10.3389/fcell.2021.796249] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 11/23/2021] [Indexed: 11/19/2022] Open

Liu Y, Xie G, Li A, He Z, Hei X. Prediction of Cancer-Related piRNAs Based on Network-Based Stratification Analysis. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s0218001422590029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Yuan X, Ma C, Zhao H, Yang L, Wang S, Xi J. STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2692-2701. [PMID: 32086221 DOI: 10.1109/tcbb.2020.2975181] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Huang T, Li J, Jia B, Sang H. CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;12:700874. [PMID: 34484298 PMCID: PMC8415314 DOI: 10.3389/fgene.2021.700874] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/19/2021] [Indexed: 11/20/2022] Open

Filer DL, Kuo F, Brandt AT, Tilley CR, Mieczkowski PA, Berg JS, Robasky K, Li Y, Bizon C, Tilson JL, Powell BC, Bost DM, Jeffries CD, Wilhelmsen KC. Pre-capture multiplexing provides additional power to detect copy number variation in exome sequencing. BMC Bioinformatics 2021;22:374. [PMID: 34284719 PMCID: PMC8293537 DOI: 10.1186/s12859-021-04246-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 05/18/2021] [Indexed: 11/10/2022] Open

Abstract

Background

As exome sequencing (ES) integrates into clinical practice, we should make every effort to utilize all information generated. Copy-number variation can lead to Mendelian disorders, but small copy-number variants (CNVs) often get overlooked or obscured by under-powered data collection. Many groups have developed methodology for detecting CNVs from ES, but existing methods often perform poorly for small CNVs and rely on large numbers of samples not always available to clinical laboratories. Furthermore, methods often rely on Bayesian approaches requiring user-defined priors in the setting of insufficient prior knowledge. This report first demonstrates the benefit of multiplexed exome capture (pooling samples prior to capture), then presents a novel detection algorithm, mcCNV (“multiplexed capture CNV”), built around multiplexed capture.

Results

We demonstrate: (1) multiplexed capture reduces inter-sample variance; (2) our mcCNV method, a novel depth-based algorithm for detecting CNVs from multiplexed capture ES data, improves the detection of small CNVs. We contrast our novel approach, agnostic to prior information, with the the commonly-used ExomeDepth. In a simulation study mcCNV demonstrated a favorable false discovery rate (FDR). When compared to calls made from matched genome sequencing, we find the mcCNV algorithm performs comparably to ExomeDepth.

Conclusion

Implementing multiplexed capture increases power to detect single-exon CNVs. The novel mcCNV algorithm may provide a more favorable FDR than ExomeDepth. The greatest benefits of our approach derive from (1) not requiring a database of reference samples and (2) not requiring prior information about the prevalance or size of variants.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04246-w.

Collapse

Zhao HY, Li Q, Tian Y, Chen YH, Alvi HAK, Yuan XG. CIRCNV: Detection of CNVs Based on a Circular Profile of Read Depth from Sequencing Data. BIOLOGY 2021;10:biology10070584. [PMID: 34202028 PMCID: PMC8301091 DOI: 10.3390/biology10070584] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 06/10/2021] [Accepted: 06/21/2021] [Indexed: 12/29/2022]

Abstract

Simple Summary

In this study, we propose a copy number variation (CNV) detection method called CIRCNV, which is based on a circular profile of the read depth from sequencing data. The proposed method is an extended version of our previously developed method CNV-LOF. The main difference of CIRCNV from CNV-LOF lies in its two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation to generate a meaningful two-dimensional dataset for CNV analysis and promote fairness between the ends and middle part of the genome, and (2) it performs two rounds of CNV declaration via estimating tumor purity and recovering the truth circular RD profile. We test and evaluate the performance of CIRCNV via conducting simulation studies and real sequencing tumor sample applications. The experimental results show that CIRCNV outperforms peer methods with respect to sensitivity, precision, and the F1-score. The experiments prove that the proposed method is a reliable and effective tool in the field of variation analysis of tumor genomes.

Abstract

Copy number variation (CNV) is a common type of structural variation in the human genome. Accurate detection of CNVs from tumor genomes can provide crucial information for the study of tumor genesis and cancer precision diagnosis. However, the contamination of normal genomes in tumor genomes and the crude profiles of the read depth make such a task difficult. In this paper, we propose an alternative approach, called CIRCNV, for the detection of CNVs from sequencing data. CIRCNV is an extension of our previously developed method CNV-LOF, which uses local outlier factors to predict CNVs. Comparatively, CIRCNV can be performed on individual tumor samples and has the following two new features: (1) it transfers the read depth profile from a line shape to a circular shape via a polar coordinate transformation, in order to improve the efficiency of the read depth (RD) profile for the detection of CNVs; and (2) it performs a second round of CNV declaration based on the truth circular RD profile, which is recovered by estimating tumor purity. We test and validate the performance of CIRCNV based on simulation and real sequencing data and perform comparisons with several peer methods. The results demonstrate that CIRCNV can obtain superior performance in terms of sensitivity and precision. We expect that our proposed method will be a supplement to existing methods and become a routine tool in the field of variation analysis of tumor genomes.

Collapse

Guo Y, Wang S, Yuan X. HBOS-CNV: A New Approach to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;12:642473. [PMID: 34163521 PMCID: PMC8215577 DOI: 10.3389/fgene.2021.642473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 05/05/2021] [Indexed: 11/13/2022] Open

He Z, Zhang J, Yuan X, Zhang Y. Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods. Front Genet 2021;11:632901. [PMID: 33537063 PMCID: PMC7848170 DOI: 10.3389/fgene.2020.632901] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 12/30/2020] [Indexed: 12/13/2022] Open

Abstract

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.

Collapse

Copy Number Variation: Methods and Clinical Applications. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11020819] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Xie K, Tian Y, Yuan X. A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2021;11:632311. [PMID: 33519925 PMCID: PMC7838601 DOI: 10.3389/fgene.2020.632311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 12/21/2020] [Indexed: 11/29/2022] Open

Liu G, Zhang J, Yuan X, Wei C. RKDOSCNV: A Local Kernel Density-Based Approach to the Detection of Copy Number Variations by Using Next-Generation Sequencing Data. Front Genet 2020;11:569227. [PMID: 33329705 PMCID: PMC7673372 DOI: 10.3389/fgene.2020.569227] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/04/2020] [Indexed: 12/04/2022] Open

Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020;11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open

Zhao H, Huang T, Li J, Liu G, Yuan X. MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data. Front Genet 2020;11:434. [PMID: 32499814 PMCID: PMC7243272 DOI: 10.3389/fgene.2020.00434] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 04/08/2020] [Indexed: 11/13/2022] Open

Abstract

Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.

Collapse

Yuan X, Li Z, Zhao H, Bai J, Zhang J. Accurate Inference of Tumor Purity and Absolute Copy Numbers From High-Throughput Sequencing Data. Front Genet 2020;11:458. [PMID: 32425990 PMCID: PMC7205152 DOI: 10.3389/fgene.2020.00458] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023] Open