1
|
Li C, Chen L, Pan G, Zhang W, Li SC. Deciphering complex breakage-fusion-bridge genome rearrangements with Ambigram. Nat Commun 2023; 14:5528. [PMID: 37684230 PMCID: PMC10491683 DOI: 10.1038/s41467-023-41259-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open
Abstract
Breakage-fusion-bridge (BFB) is a complex rearrangement that leads to tumor malignancy. Existing models for detecting BFBs rely on the ideal BFB hypothesis, ruling out the possibility of BFBs entangled with other structural variations, that is, complex BFBs. We propose an algorithm Ambigram to identify complex BFB and reconstruct the rearranged structure of the local genome during the cancer subclone evolution process. Ambigram handles data from short, linked, long, and single-cell sequences, and optical mapping technologies. Ambigram successfully deciphers the gold- or silver-standard complex BFBs against the state-of-the-art in multiple cancers. Ambigram dissects the intratumor heterogeneity of complex BFB events with single-cell reads from melanoma and gastric cancer. Furthermore, applying Ambigram to liver and cervical cancer data suggests that the BFB mechanism may mediate oncovirus integrations. BFB also exists in noncancer genomics. Investigating the complete human genome reference with Ambigram suggests that the BFB mechanism may be involved in two genome reorganizations of Homo Sapiens during evolution. Moreover, Ambigram discovers the signals of recurrent foldback inversions and complex BFBs in whole genome data from the 1000 genome project, and congenital heart diseases, respectively.
Collapse
Affiliation(s)
- Chaohui Li
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Guangze Pan
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Wenqian Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
2
|
Zhang L, Chen L, Li SC, Wang M, Li C, Song T, Ni Y, Yang Y, Liu Z, Yao M, Shen B, Li W. Heterogeneity in lung cancers by single-cell DNA sequencing. Clin Transl Med 2023; 13:e1388. [PMID: 37649132 PMCID: PMC10468563 DOI: 10.1002/ctm2.1388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 08/11/2023] [Accepted: 08/17/2023] [Indexed: 09/01/2023] Open
Affiliation(s)
- Li Zhang
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| | - Lingxi Chen
- Department of Computer ScienceCity University of Hong KongKowloonChina
| | - Shuai Cheng Li
- Department of Computer ScienceCity University of Hong KongKowloonChina
| | - Mengyao Wang
- Department of Computer ScienceCity University of Hong KongKowloonChina
| | - Chaohui Li
- Department of Computer ScienceCity University of Hong KongKowloonChina
| | - Tingting Song
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| | - Yinyun Ni
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| | - Ying Yang
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| | - Zhiqiang Liu
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| | - Menglin Yao
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| | - Bairong Shen
- Institutes for Systems GeneticsFrontiers Science Center for Disease‐Related Molecular NetworkWest China HospitalSichuan UniversityChengduChina
| | - Weimin Li
- Department of Pulmonary and Critical Care MedicineInstitute of Respiratory HealthState Key Laboratory of Respiratory Health and MultimorbidityFrontiers Science Center for Disease‐related Molecular NetworkPrecision Medicine Key Laboratory of Sichuan ProvinceWest China HospitalWest China School of MedicineSichuan UniversityChengduChina
| |
Collapse
|
3
|
Adaptive Savitzky–Golay Filters for Analysis of Copy Number Variation Peaks from Whole-Exome Sequencing Data. INFORMATION 2023. [DOI: 10.3390/info14020128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023] Open
Abstract
Copy number variation (CNV) is a form of structural variation in the human genome that provides medical insight into complex human diseases; while whole-genome sequencing is becoming more affordable, whole-exome sequencing (WES) remains an important tool in clinical diagnostics. Because of its discontinuous nature and unique characteristics of sparse target-enrichment-based WES data, the analysis and detection of CNV peaks remain difficult tasks. The Savitzky–Golay (SG) smoothing is well known as a fast and efficient smoothing method. However, no study has documented the use of this technique for CNV peak detection. It is well known that the effectiveness of the classical SG filter depends on the proper selection of the window length and polynomial degree, which should correspond with the scale of the peak because, in the case of peaks with a high rate of change, the effectiveness of the filter could be restricted. Based on the Savitzky–Golay algorithm, this paper introduces a novel adaptive method to smooth irregular peak distributions. The proposed method ensures high-precision noise reduction by dynamically modifying the results of the prior smoothing to automatically adjust parameters. Our method offers an additional feature extraction technique based on density and Euclidean distance. In comparison to classical Savitzky–Golay filtering and other peer filtering methods, the performance evaluation demonstrates that adaptive Savitzky–Golay filtering performs better. According to experimental results, our method effectively detects CNV peaks across all genomic segments for both short and long tags, with minimal peak height fidelity values (i.e., low estimation bias). As a result, we clearly demonstrate how well the adaptive Savitzky–Golay filtering method works and how its use in the detection of CNV peaks can complement the existing techniques used in CNV peak analysis.
Collapse
|
4
|
Chen L, Li S. Incorporating cell hierarchy to decipher the functional diversity of single cells. Nucleic Acids Res 2023; 51:e9. [PMID: 36373664 PMCID: PMC9881154 DOI: 10.1093/nar/gkac1044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/13/2022] [Accepted: 10/21/2022] [Indexed: 11/16/2022] Open
Abstract
Cells possess functional diversity hierarchically. However, most single-cell analyses neglect the nested structures while detecting and visualizing the functional diversity. Here, we incorporate cell hierarchy to study functional diversity at subpopulation, club (i.e., sub-subpopulation), and cell layers. Accordingly, we implement a package, SEAT, to construct cell hierarchies utilizing structure entropy by minimizing the global uncertainty in cell-cell graphs. With cell hierarchies, SEAT deciphers functional diversity in 36 datasets covering scRNA, scDNA, scATAC, and scRNA-scATAC multiome. First, SEAT finds optimal cell subpopulations with high clustering accuracy. It identifies cell types or fates from omics profiles and boosts accuracy from 0.34 to 1. Second, SEAT detects insightful functional diversity among cell clubs. The hierarchy of breast cancer cells reveals that the specific tumor cell club drives AREG-EGFT signaling. We identify a dense co-accessibility network of cis-regulatory elements specified by one cell club in GM12878. Third, the cell order from the hierarchy infers periodic pseudo-time of cells, improving accuracy from 0.79 to 0.89. Moreover, we incorporate cell hierarchy layers as prior knowledge to refine nonlinear dimension reduction, enabling us to visualize hierarchical cell layouts in low-dimensional space.
Collapse
Affiliation(s)
- Lingxi Chen
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, 518057, Guangdong, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, China
- City University of Hong Kong Shenzhen Research Institute, Shenzhen, 518057, Guangdong, China
| |
Collapse
|
5
|
Wang X, Chen L, Liu W, Zhang Y, Liu D, Zhou C, Shi S, Dong J, Lai Z, Zhao B, Zhang W, Cheng H, Li S. TIMEDB: tumor immune micro-environment cell composition database with automatic analysis and interactive visualization. Nucleic Acids Res 2022; 51:D1417-D1424. [PMID: 36399488 PMCID: PMC9825442 DOI: 10.1093/nar/gkac1006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/30/2022] [Accepted: 10/21/2022] [Indexed: 11/19/2022] Open
Abstract
Deciphering the cell-type composition in the tumor immune microenvironment (TIME) can significantly increase the efficacy of cancer treatment and improve the prognosis of cancer. Such a task has benefited from microarrays and RNA sequencing technologies, which have been widely adopted in cancer studies, resulting in extensive expression profiles with clinical phenotypes across multiple cancers. Current state-of-the-art tools can infer cell-type composition from bulk expression profiles, providing the possibility of investigating the inter-heterogeneity and intra-heterogeneity of TIME across cancer types. Much can be gained from these tools in conjunction with a well-curated database of TIME cell-type composition data, accompanied by the corresponding clinical information. However, currently available databases fall short in data volume, multi-platform dataset integration, and tool integration. In this work, we introduce TIMEDB (https://timedb.deepomics.org), an online database for human tumor immune microenvironment cell-type composition estimated from bulk expression profiles. TIMEDB stores manually curated expression profiles, cell-type composition profiles, and the corresponding clinical information of a total of 39,706 samples from 546 datasets across 43 cancer types. TIMEDB comes readily equipped with online tools for automatic analysis and interactive visualization, and aims to serve the community as a convenient tool for investigating the human tumor microenvironment.
Collapse
Affiliation(s)
| | | | | | - Yuanzheng Zhang
- School of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Dawei Liu
- School of Software, Northeastern University, Shenyang, Liaoning, China
| | - Chenxin Zhou
- School of Business Administration, Northeastern University, Shenyang, Liaoning, China
| | - Shuai Shi
- School of Creative Media, City University of Hong Kong, Hong Kong, China
| | - Jiajie Dong
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Zhengtao Lai
- School of Business Administration, Northeastern University, Shenyang, Liaoning, China
| | - Bingran Zhao
- School of Business Administration, Northeastern University, Shenyang, Liaoning, China
| | - Wenjingyu Zhang
- School of Business Administration, Northeastern University, Shenyang, Liaoning, China
| | - Haoyue Cheng
- Department of Clinical Pathology, Capital Medical University, Beijing, China
| | - Shuaicheng Li
- To whom correspondence should be addressed. Tel: +852 3442 9412;
| |
Collapse
|
6
|
Ruohan W, Yuwei Z, Mengbo W, Xikang F, Jianping W, Shuai Cheng L. Resolving single-cell copy number profiling for large datasets. Brief Bioinform 2022; 23:6633647. [PMID: 35801503 DOI: 10.1093/bib/bbac264] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/29/2022] [Accepted: 06/06/2022] [Indexed: 11/14/2022] Open
Abstract
The advances of single-cell DNA sequencing (scDNA-seq) enable us to characterize the genetic heterogeneity of cancer cells. However, the high noise and low coverage of scDNA-seq impede the estimation of copy number variations (CNVs). In addition, existing tools suffer from intensive execution time and often fail on large datasets. Here, we propose SeCNV, an efficient method that leverages structural entropy, to profile the copy numbers. SeCNV adopts a local Gaussian kernel to construct a matrix, depth congruent map (DCM), capturing the similarities between any two bins along the genome. Then, SeCNV partitions the genome into segments by minimizing the structural entropy from the DCM. With the partition, SeCNV estimates the copy numbers within each segment for cells. We simulate nine datasets with various breakpoint distributions and amplitudes of noise to benchmark SeCNV. SeCNV achieves a robust performance, i.e. the F1-scores are higher than 0.95 for breakpoint detections, significantly outperforming state-of-the-art methods. SeCNV successfully processes large datasets (>50 000 cells) within 4 min, while other tools fail to finish within the time limit, i.e. 120 h. We apply SeCNV to single-nucleus sequencing datasets from two breast cancer patients and acoustic cell tagmentation sequencing datasets from eight breast cancer patients. SeCNV successfully reproduces the distinct subclones and infers tumor heterogeneity. SeCNV is available at https://github.com/deepomicslab/SeCNV.
Collapse
Affiliation(s)
- Wang Ruohan
- Department of Computer Science at City University of Hong Kong
| | - Zhang Yuwei
- Department of Computer Science at City University of Hong Kong
| | - Wang Mengbo
- Department of Computer Science at City University of Hong Kong
| | - Feng Xikang
- School of Software, Northwestern Polytechnical University
| | - Wang Jianping
- Department of Computer Science at City University of Hong Kong
| | - Li Shuai Cheng
- Department of Computer Science at City University of Hong Kong
| |
Collapse
|
7
|
Feng X, Chen L. SCSilicon: a tool for synthetic single-cell DNA sequencing data generation. BMC Genomics 2022; 23:359. [PMID: 35546390 PMCID: PMC9092674 DOI: 10.1186/s12864-022-08566-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 04/19/2022] [Indexed: 11/25/2022] Open
Abstract
Background Single-cell DNA sequencing is getting indispensable in the study of cell-specific cancer genomics. The performance of computational tools that tackle single-cell genome aberrations may be nevertheless undervalued or overvalued, owing to the insufficient size of benchmarking data. In silicon simulation is a cost-effective approach to generate as many single-cell genomes as possible in a controlled manner to make reliable and valid benchmarking. Results This study proposes a new tool, SCSilicon, which efficiently generates single-cell in silicon DNA reads with minimum manual intervention. SCSilicon automatically creates a set of genomic aberrations, including SNP, SNV, Indel, and CNV. Besides, SCSilicon yields the ground truth of CNV segmentation breakpoints and subclone cell labels. We have manually inspected a series of synthetic variations. We conducted a sanity check of the start-of-the-art single-cell CNV callers and found SCYN was the most robust one. Conclusions SCSilicon is a user-friendly software package for users to develop and benchmark single-cell CNV callers. Source code of SCSilicon is available at https://github.com/xikanfeng2/SCSilicon. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08566-w).
Collapse
Affiliation(s)
- Xikang Feng
- School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, China.
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| |
Collapse
|
8
|
Feng X, Chen L, Qing Y, Li R, Li C, Li SC. SCYN: single cell CNV profiling method using dynamic programming. BMC Genomics 2021; 22:651. [PMID: 34789142 PMCID: PMC8596905 DOI: 10.1186/s12864-021-07941-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 08/20/2021] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Copy number variation is crucial in deciphering the mechanism and cure of complex disorders and cancers. The recent advancement of scDNA sequencing technology sheds light upon addressing intratumor heterogeneity, detecting rare subclones, and reconstructing tumor evolution lineages at single-cell resolution. Nevertheless, the current circular binary segmentation based approach proves to fail to efficiently and effectively identify copy number shifts on some exceptional trails. RESULTS Here, we propose SCYN, a CNV segmentation method powered with dynamic programming. SCYN resolves the precise segmentation on in silico dataset. Then we verified SCYN manifested accurate copy number inferring on triple negative breast cancer scDNA data, with array comparative genomic hybridization results of purified bulk samples as ground truth validation. We tested SCYN on two datasets of the newly emerged 10x Genomics CNV solution. SCYN successfully recognizes gastric cancer cells from 1% and 10% spike-ins 10x datasets. Moreover, SCYN is about 150 times faster than state of the art tool when dealing with the datasets of approximately 2000 cells. CONCLUSIONS SCYN robustly and efficiently detects segmentations and infers copy number profiles on single cell DNA sequencing data. It serves to reveal the tumor intra-heterogeneity. The source code of SCYN can be accessed in https://github.com/xikanfeng2/SCYN .
Collapse
Affiliation(s)
- Xikang Feng
- School of Software, Northwestern Polytechnical University, Xi’an Shaanxi, 710072 China
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Yuhao Qing
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Ruikang Li
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Chaohui Li
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
- Department of Biomedical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| |
Collapse
|