1
|
Chang L, Jiao H, Chen J, Wu G, Liu P, Li R, Guo J, Long W, Tang X, Lu B, Xu H, Wu H. Single-cell whole-genome sequencing, haplotype analysis in prenatal diagnosis of monogenic diseases. Life Sci Alliance 2023; 6:e202201761. [PMID: 36810160 PMCID: PMC9947115 DOI: 10.26508/lsa.202201761] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 02/10/2023] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Monogenic inherited diseases are common causes of congenital disabilities, leading to severe economic and mental burdens on affected families. In our previous study, we demonstrated the validity of cell-based noninvasive prenatal testing (cbNIPT) in prenatal diagnosis by single-cell targeted sequencing. The present research further explored the feasibility of single-cell whole-genome sequencing (WGS) and haplotype analysis of various monogenic diseases with cbNIPT. Four families were recruited: one with inherited deafness, one with hemophilia, one with large vestibular aqueduct syndrome (LVAS), and one with no disease. Circulating trophoblast cells (cTBs) were obtained from maternal blood and analyzed by single-cell 15X WGS. Haplotype analysis showed that CFC178 (deafness family), CFC616 (hemophilia family), and CFC111 (LVAS family) inherited haplotypes from paternal and/or maternal pathogenic loci. Amniotic fluid or fetal villi samples from the deafness and hemophilia families confirmed these results. WGS performed better than targeted sequencing in genome coverage, allele dropout (ADO), and false-positive (FP) ratios. Our findings suggest that cbNIPT by WGS and haplotype analysis have great potential for use in prenatally diagnosing various monogenic diseases.
Collapse
Affiliation(s)
- Liang Chang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Haining Jiao
- Department of Obstetrics and Gynecology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jiucheng Chen
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Guanlin Wu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Ping Liu
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Rong Li
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Jianying Guo
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
- National Clinical Research Center for Obstetrics and Gynecology (Peking University Third Hospital), Beijing, China
- Key Laboratory of Assisted Reproduction (Peking University), Ministry of Education, Beijing, China
- Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproductive Technology, Beijing, China
| | - Wenqing Long
- Department of Obstetrics and Gynecology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiaojian Tang
- Department of Obstetrics and Gynecology, Rui-Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Bingjie Lu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Haibin Xu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| | - Han Wu
- Unimed Biotech (Shanghai) Co., Ltd., Shanghai, China
| |
Collapse
|
2
|
Li Y, Lin Y. DCHap: A Divide-and-Conquer Haplotype Phasing Algorithm for Third-Generation Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1277-1284. [PMID: 32750878 DOI: 10.1109/tcbb.2020.3005673] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The development of DNA sequencing technologies makes it possible to obtain reads originated from both copies of a chromosome (two parental chromosomes, or haplotypes) of a single individual. Reconstruction of both haplotypes (i.e., haplotype phasing) plays a crucial role in genetic analysis and provides relationship information between genetic variation and disease susceptibility. With the emerging third-generation sequencing technologies, most existing approaches for haplotype phasing suffer from performance issues to handle long and error-prone reads. We develop a divide-and-conquer algorithm, DCHap, to phase haplotypes using third-generation reads. We benchmark DCHap against three state-of-the-art phasing tools on both PacBio SMRT data and ONT Nanopore data. The experimental results show that DCHap generates more accurate or comparable results (measured by the switch errors) while being scalable for higher coverage and longer reads. DCHap is a fast and accurate algorithm for haplotype phasing using third-generation sequencing data. As the third-generation sequencing platforms continue improving on their throughput and read lengths, accurate and scalable tools like DCHap are important to improve haplotype phasing from the advances of sequencing technologies. The source code is freely available at https://github.com/yanboANU/Haplotype-phasing.
Collapse
|
3
|
Zhang L, Lv Y, Xu L, Zhou M. A Review of DNA Data Storage Technologies Based on Biomolecules. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210813101237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
:
In the information age, data storage technology has become the key to improving computer
systems. Since traditional storage technologies cannot meet the demand for massive storage, new DNA
storage technology based on biomolecules attracts much attention. DNA storage refers to the technology
that uses artificially synthesized deoxynucleotide chains to store and read all information, such as documents,
pictures, and audio. First, data are encoded into binary number strings. Then, the four types of
base, A(Adenine), T(Thymine), C(Cytosine), and G(Guanine), are used to encode the corresponding binary
numbers so that the data can be used to construct the target DNA molecules in the form of deoxynucleotide
chains. Subsequently, the corresponding DNA molecules are artificially synthesized, enabling
the data to be stored within them. Compared with traditional storage systems, DNA storage has
major advantages, such as high storage density, long duration, as well as low hardware cost, high access
parallelism, and strong scalability, which satisfies the demands for big data storage. This manuscript
first reviews the origin and development of DNA storage technology, then the storage principles, contents,
and methods are introduced. Finally, the development of DNA storage technology is analyzed.
From the initial research to the cutting edge of this field and beyond, the advantages, disadvantages, and
practical applications of DNA storage technology require continuous exploration.
Collapse
Affiliation(s)
- Lichao Zhang
- Shenzhen Key Laboratory of Photonic Devices and Sensing Systems for Internet of Things, College of Physics and Optoelectronic
Engineering, Shenzhen University, Shenzhen 518060, China
| | - Yuanyuan Lv
- Yangtze Delta Region Institute
(Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, Zhejiang, China
| | - Lei Xu
- School of
Electronic and Communication Engineering, ShenZhen Polytechnic, Shenzhen 518000, China
| | - Murong Zhou
- College of Information
and Computer Engineering, Northeast Forestry University, Harbin, 150000, China
| |
Collapse
|
4
|
Genome assembly and annotation. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00013-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
5
|
Guo Y, Cheng H, Yuan Z, Liang Z, Wang Y, Du D. Testing Gene-Gene Interactions Based on a Neighborhood Perspective in Genome-wide Association Studies. Front Genet 2021; 12:801261. [PMID: 34956337 PMCID: PMC8693929 DOI: 10.3389/fgene.2021.801261] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 11/15/2021] [Indexed: 12/21/2022] Open
Abstract
Unexplained genetic variation that causes complex diseases is often induced by gene-gene interactions (GGIs). Gene-based methods are one of the current statistical methodologies for discovering GGIs in case-control genome-wide association studies that are not only powerful statistically, but also interpretable biologically. However, most approaches include assumptions about the form of GGIs, which results in poor statistical performance. As a result, we propose gene-based testing based on the maximal neighborhood coefficient (MNC) called gene-based gene-gene interaction through a maximal neighborhood coefficient (GBMNC). MNC is a metric for capturing a wide range of relationships between two random vectors with arbitrary, but not necessarily equal, dimensions. We established a statistic that leverages the difference in MNC in case and in control samples as an indication of the existence of GGIs, based on the assumption that the joint distribution of two genes in cases and controls should not be substantially different if there is no interaction between them. We then used a permutation-based statistical test to evaluate this statistic and calculate a statistical p-value to represent the significance of the interaction. Experimental results using both simulation and real data showed that our approach outperformed earlier methods for detecting GGIs.
Collapse
Affiliation(s)
- Yingjie Guo
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Honghong Cheng
- School of Information, Shanxi University of Finance and Economics, Taiyuan, China
| | - Zhian Yuan
- Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
| | - Zhen Liang
- School of Life Science, Shanxi University, Taiyuan, China
| | - Yang Wang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Debing Du
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
6
|
Guo Y, Wu C, Yuan Z, Wang Y, Liang Z, Wang Y, Zhang Y, Xu L. Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies. Front Cell Dev Biol 2021; 9:801113. [PMID: 34977040 PMCID: PMC8716787 DOI: 10.3389/fcell.2021.801113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 11/23/2021] [Indexed: 11/30/2022] Open
Abstract
Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.
Collapse
Affiliation(s)
- Yingjie Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Chenxi Wu
- Department of Mathematics, University of Wisconsin-Madison, Madison, WI, United States
| | - Zhian Yuan
- Research Institute of Big Data Science and Industry, Shanxi University, Taiyuan, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Zhen Liang
- School of Life Science, Shanxi University, Taiyuan, China
| | - Yang Wang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Yi Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
- *Correspondence: Yi Zhang, ; Lei Xu,
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
- *Correspondence: Yi Zhang, ; Lei Xu,
| |
Collapse
|
7
|
Detecting and phasing minor single-nucleotide variants from long-read sequencing data. Nat Commun 2021; 12:3032. [PMID: 34031367 PMCID: PMC8144375 DOI: 10.1038/s41467-021-23289-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 04/15/2021] [Indexed: 02/04/2023] Open
Abstract
Cellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.
Collapse
|
8
|
Zhang Z, Cui F, Lin C, Zhao L, Wang C, Zou Q. Critical downstream analysis steps for single-cell RNA sequencing data. Brief Bioinform 2021; 22:6210064. [PMID: 33822873 DOI: 10.1093/bib/bbab105] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 02/20/2021] [Accepted: 03/09/2021] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled us to study biological questions at the single-cell level. Currently, many analysis tools are available to better utilize these relatively noisy data. In this review, we summarize the most widely used methods for critical downstream analysis steps (i.e. clustering, trajectory inference, cell-type annotation and integrating datasets). The advantages and limitations are comprehensively discussed, and we provide suggestions for choosing proper methods in different situations. We hope this paper will be useful for scRNA-seq data analysts and bioinformatics tool developers.
Collapse
Affiliation(s)
- Zilong Zhang
- University of Electronic Science and Technology of China
| | | | | | | | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|
9
|
Xie Y, Zhong Y, Chang J, Kwan HS. Chromosome-level de novo assembly of Coprinopsis cinerea A43mut B43mut pab1-1 #326 and genetic variant identification of mutants using Nanopore MinION sequencing. Fungal Genet Biol 2020; 146:103485. [PMID: 33253902 DOI: 10.1016/j.fgb.2020.103485] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 10/22/2020] [Accepted: 11/13/2020] [Indexed: 11/26/2022]
Abstract
The homokaryotic Coprinopsis cinerea strain A43mut B43mut pab1-1 #326 is a widely used experimental model for developmental studies in mushroom-forming fungi. It can grow on defined artificial media and complete the whole lifecycle within two weeks. The mutations in mating type factors A and B result in the special feature of clamp formation and fruiting without mating. This feature allows investigations and manipulations with a homokaryotic genetic background. Current genome assembly of strain #326 was based on short-read sequencing data and was highly fragmented, leading to the bias in gene annotation and downstream analyses. Here, we report a chromosome-level genome assembly of strain #326. Oxford Nanopore Technology (ONT) MinION sequencing was used to get long reads. Illumina short reads was used to polish the sequences. A combined assembly yield 13 chromosomes and a mitochondrial genome as individual scaffolds. The assembly has 15,250 annotated genes with a high synteny with the C. cinerea strain Okayama-7 #130. This assembly has great improvement on contiguity and annotations. It is a suitable reference for further genomic studies, especially for the genetic, genomic and transcriptomic analyses in ONT long reads. Single nucleotide variants and structural variants in six mutagenized and cisplatin-screened mutants could be identified and validated. A 66 bp deletion in Ras GTPase-activating protein (RasGAP) was found in all mutants. To make a better use of ONT sequencing platform, we modified a high-molecular-weight genomic DNA isolation protocol based on magnetic beads for filamentous fungi. This study showed the use of MinION to construct a fungal reference genome and to perform downstream studies in an individual laboratory. An experimental workflow was proposed, from DNA isolation and whole genome sequencing, to genome assembly and variant calling. Our results provided solutions and parameters for fungal genomic analysis on MinION sequencing platform.
Collapse
Affiliation(s)
- Yichun Xie
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region
| | - Yiyi Zhong
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region
| | - Jinhui Chang
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region; The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China
| | - Hoi Shan Kwan
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region.
| |
Collapse
|
10
|
Li Y, Zhang Z, Teng Z, Liu X. PredAmyl-MLP: Prediction of Amyloid Proteins Using Multilayer Perceptron. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8845133. [PMID: 33294004 PMCID: PMC7700051 DOI: 10.1155/2020/8845133] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/31/2020] [Indexed: 01/20/2023]
Abstract
Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer's disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.
Collapse
Affiliation(s)
- Yanjuan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zitong Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China
| |
Collapse
|
11
|
Li S, Jiang L, Tang J, Gao N, Guo F. Kernel Fusion Method for Detecting Cancer Subtypes via Selecting Relevant Expression Data. Front Genet 2020; 11:979. [PMID: 33133130 PMCID: PMC7511763 DOI: 10.3389/fgene.2020.00979] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 08/03/2020] [Indexed: 12/19/2022] Open
Abstract
Recently, cancer has been characterized as a heterogeneous disease composed of many different subtypes. Early diagnosis of cancer subtypes is an important study of cancer research, which can be of tremendous help to patients after treatment. In this paper, we first extract a novel dataset, which contains gene expression, miRNA expression, and isoform expression of five cancers from The Cancer Genome Atlas (TCGA). Next, to avoid the effect of noise existing in 60, 483 genes, we select a small number of genes by using LASSO that employs gene expression and survival time of patients. Then, we construct one similarity kernel for each expression data by using Chebyshev distance. And also, We used SKF to fused the three similarity matrix composed of gene, Iso, and miRNA, and finally clustered the fused similarity matrix with spectral clustering. In the experimental results, our method has better P-value in the Cox model than other methods on 10 cancer data from Jiang Dataset and Novel Dataset. We have drawn different survival curves for different cancers and found that some genes play a key role in cancer. For breast cancer, we find out that HSPA2A, RNASE1, CLIC6, and IFITM1 are highly expressed in some specific groups. For lung cancer, we ensure that C4BPA, SESN3, and IRS1 are highly expressed in some specific groups. The code and all supporting data files are available from https://github.com/guofei-tju/Uncovering-Cancer-Subtypes-via-LASSO.
Collapse
Affiliation(s)
- Shuhao Li
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Nan Gao
- School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
12
|
Zhuang H, Zhang Y, Yang S, Cheng L, Liu SL. A Mendelian Randomization Study on Infant Length and Type 2 Diabetes Mellitus Risk. Curr Gene Ther 2020; 19:224-231. [PMID: 31553296 DOI: 10.2174/1566523219666190925115535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/15/2019] [Accepted: 06/16/2019] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Infant length (IL) is a positively associated phenotype of type 2 diabetes mellitus (T2DM), but the causal relationship of which is still unclear. Here, we applied a Mendelian randomization (MR) study to explore the causal relationship between IL and T2DM, which has the potential to provide guidance for assessing T2DM activity and T2DM- prevention in young at-risk populations. MATERIALS AND METHODS To classify the study, a two-sample MR, using genetic instrumental variables (IVs) to explore the causal effect was applied to test the influence of IL on the risk of T2DM. In this study, MR was carried out on GWAS data using 8 independent IL SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated by the inverse-variance weighted method for the assessment of the risk the shorter IL brings to T2DM. Sensitivity validation was conducted to identify the effect of individual SNPs. MR-Egger regression was used to detect pleiotropic bias of IVs. RESULTS The pooled odds ratio from the IVW method was 1.03 (95% CI 0.89-1.18, P = 0.0785), low intercept was -0.477, P = 0.252, and small fluctuation of ORs ranged from -0.062 ((0.966 - 1.03) / 1.03) to 0.05 ((1.081 - 1.03) / 1.03) in leave-one-out validation. CONCLUSION We validated that the shorter IL causes no additional risk to T2DM. The sensitivity analysis and the MR-Egger regression analysis also provided adequate evidence that the above result was not due to any heterogeneity or pleiotropic effect of IVs.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, 150001, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine- Pharmaceutics of China), Harbin Medical University, Harbin, China.,HMU-UCFM Centre for Infection and Genomics, Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, Canada.,Department of Infectious Diseases, The First Affiliated Hospital, Harbin Medical University, Harbin, China.,Translational Medicine Research and Cooperation Center of Northern China, Heilongjiang Academy of Medical Sciences, Harbin, China
| |
Collapse
|
13
|
Yuan L, Guo F, Wang L, Zou Q. Prediction of tumor metastasis from sequencing data in the era of genome sequencing. Brief Funct Genomics 2020; 18:412-418. [PMID: 31204784 DOI: 10.1093/bfgp/elz010] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 02/22/2019] [Accepted: 04/26/2019] [Indexed: 02/01/2023] Open
Abstract
Tumor metastasis is the key reason for the high mortality rate of tumor. Growing number of scholars have begun to pay attention to the research on tumor metastasis and have achieved satisfactory results in this field. The advent of the era of sequencing has enabled us to study cancer metastasis at the molecular level, which is essential for understanding the molecular mechanism of metastasis, identifying diagnostic markers and therapeutic targets and guiding clinical decision-making. We reviewed the metastasis-related studies using sequencing data, covering detection of metastasis origin sites, determination of metastasis potential and identification of distal metastasis sites. These findings include the discovery of relevant markers and the presentation of prediction tools. Finally, we discussed the challenge of studying metastasis considering the difficulty of obtaining metastatic cancer data, the complexity of tumor heterogeneity and the uncertainty of sample labels.
Collapse
Affiliation(s)
- Linlin Yuan
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
14
|
Yan Z, Zhu X, Wang Y, Nie Y, Guan S, Kuo Y, Chang D, Li R, Qiao J, Yan L. scHaplotyper: haplotype construction and visualization for genetic diagnosis using single cell DNA sequencing data. BMC Bioinformatics 2020; 21:41. [PMID: 32007105 PMCID: PMC6995221 DOI: 10.1186/s12859-020-3381-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 01/22/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Haplotyping reveals chromosome blocks inherited from parents to in vitro fertilized (IVF) embryos in preimplantation genetic diagnosis (PGD), enabling the observation of the transmission of disease alleles between generations. However, the methods of haplotyping that are suitable for single cells are limited because a whole genome amplification (WGA) process is performed before sequencing or genotyping in PGD, and true haplotype profiles of embryos need to be constructed based on genotypes that can contain many WGA artifacts. RESULTS Here, we offer scHaplotyper as a genetic diagnosis tool that reconstructs and visualizes the haplotype profiles of single cells based on the Hidden Markov Model (HMM). scHaplotyper can trace the origin of each haplotype block in the embryo, enabling the detection of carrier status of disease alleles in each embryo. We applied this method in PGD in two families affected with genetic disorders, and the result was the healthy live births of two children in the two families, demonstrating the clinical application of this method. CONCLUSION Next generation sequencing (NGS) of preimplantation embryos enable genetic screening for families with genetic disorders, avoiding the birth of affected babies. With the validation and successful clinical application, we showed that scHaplotyper is a convenient and accurate method to screen out embryos. More patients with genetic disorder will benefit from the genetic diagnosis of embryos. The source code of scHaplotyper is available at GitHub repository: https://github.com/yzqheart/scHaplotyper.
Collapse
Affiliation(s)
- Zhiqiang Yan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.,Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Xiaohui Zhu
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Yuqian Wang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Yanli Nie
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Shuo Guan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Ying Kuo
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Di Chang
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Rong Li
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China
| | - Jie Qiao
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China.,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China.,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.,Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.,Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China.,Beijing Advanced Innovation Center for Genomics (ICG), Peking University, Beijing, 100871, China
| | - Liying Yan
- Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, 100191, China. .,Key Laboratory of Assisted Reproduction, Ministry of Education, Beijing, 100191, China. .,Beijing Key Laboratory of Reproductive Endocrinology and Assisted Reproduction, Beijing, 100191, China.
| |
Collapse
|
15
|
Pasquali F, Do Valle I, Palma F, Remondini D, Manfreda G, Castellani G, Hendriksen RS, De Cesare A. Application of different DNA extraction procedures, library preparation protocols and sequencing platforms: impact on sequencing results. Heliyon 2019; 5:e02745. [PMID: 31720479 PMCID: PMC6838873 DOI: 10.1016/j.heliyon.2019.e02745] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 04/01/2019] [Accepted: 10/25/2019] [Indexed: 01/22/2023] Open
Abstract
In this study three DNA extraction procedures, two library preparation protocols and two sequencing platforms were applied to analyse six bacterial cultures and their corresponding DNA obtained as part of a proficiency test. The impact of each variable on sequencing results was assessed using the following parameters: reads quality, assembly and alignment statistics; number of single nucleotide polymorphisms (SNPs), detected applying assembly- and alignment-based strategies; antimicrobial resistance genes (ARGs), identified on de novo assemblies of all sequenced genomes. The investigated nucleic acid extraction procedures, library preparation kits and sequencing platforms do not significantly affect de novo assembly statistics and number of SNPs and ARGs. The only exception was observed for two duplicates, which were associated to one PCR-based library preparation kit. Results from this comparative study can support researchers in the choice toward the available pre-sequencing and sequencing options, and might suggest further comparisons to be performed.
Collapse
Affiliation(s)
- F Pasquali
- Department of Food and Agricultural Sciences, Alma Mater Studiorum-University of Bologna, via del Florio 2, Ozzano dell'Emilia, 40064 Italy
| | - I Do Valle
- Department of Physics, Northeastern University, 360 Huntington Avenue, Boston, MA, 02115-5000, USA
| | - F Palma
- Department of Food and Agricultural Sciences, Alma Mater Studiorum-University of Bologna, via del Florio 2, Ozzano dell'Emilia, 40064 Italy
| | - D Remondini
- Department of Physics and Astronomy, Alma Mater Studiorum-University of Bologna, viale Berti Pichat 6/2, 40127, Bologna, Italy
| | - G Manfreda
- Department of Food and Agricultural Sciences, Alma Mater Studiorum-University of Bologna, via del Florio 2, Ozzano dell'Emilia, 40064 Italy
| | - G Castellani
- Department of Physics and Astronomy, Alma Mater Studiorum-University of Bologna, viale Berti Pichat 6/2, 40127, Bologna, Italy
| | - R S Hendriksen
- Technical University of Denmark, Kemitorvet, Kgs. Lyngby, 2800, Denmark
| | - A De Cesare
- Department of Food and Agricultural Sciences, Alma Mater Studiorum-University of Bologna, via del Florio 2, Ozzano dell'Emilia, 40064 Italy
| |
Collapse
|
16
|
Edge P, Bansal V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2019; 10:4660. [PMID: 31604920 PMCID: PMC6788989 DOI: 10.1038/s41467-019-12493-y] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/10/2019] [Indexed: 12/30/2022] Open
Abstract
Whole-genome sequencing using sequencing technologies such as Illumina enables the accurate detection of small-scale variants but provides limited information about haplotypes and variants in repetitive regions of the human genome. Single-molecule sequencing (SMS) technologies such as Pacific Biosciences and Oxford Nanopore generate long reads that can potentially address the limitations of short-read sequencing. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, which leverages the haplotype information present in SMS reads to accurately detect and phase single-nucleotide variants (SNVs) in diploid genomes. We demonstrate that Longshot achieves very high accuracy for SNV detection using whole-genome Pacific Biosciences data, outperforms existing variant calling methods, and enables variant detection in duplicated regions of the genome that cannot be mapped using short reads. Single-molecule sequencing (SMS) such as Pacific Biosciences and Oxford Nanopore generate long reads with high error rate. Here, the authors develop Longshot, a computational method that detects and phases single nucleotide variants (SNV) in diploid genomes using SMS data.
Collapse
Affiliation(s)
- Peter Edge
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, 92093, USA
| | - Vikas Bansal
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, California, 92093, USA.
| |
Collapse
|
17
|
Ebler J, Haukness M, Pesout T, Marschall T, Paten B. Haplotype-aware diplotyping from noisy long reads. Genome Biol 2019; 20:116. [PMID: 31159868 PMCID: PMC6547545 DOI: 10.1186/s13059-019-1709-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 05/06/2019] [Indexed: 12/19/2022] Open
Abstract
Current genotyping approaches for single-nucleotide variations rely on short, accurate reads from second-generation sequencing devices. Presently, third-generation sequencing platforms are rapidly becoming more widespread, yet approaches for leveraging their long but error-prone reads for genotyping are lacking. Here, we introduce a novel statistical framework for the joint inference of haplotypes and genotypes from noisy long reads, which we term diplotyping. Our technique takes full advantage of linkage information provided by long reads. We validate hundreds of thousands of candidate variants that have not yet been included in the high-confidence reference set of the Genome-in-a-Bottle effort.
Collapse
Affiliation(s)
- Jana Ebler
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, Saarbrücken, Germany
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany.
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA.
| |
Collapse
|
18
|
Zhuang H, Han J, Cheng L, Liu SL. A Positive Causal Influence of IL-18 Levels on the Risk of T2DM: A Mendelian Randomization Study. Front Genet 2019; 10:295. [PMID: 31024619 PMCID: PMC6459887 DOI: 10.3389/fgene.2019.00295] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 03/19/2019] [Indexed: 12/21/2022] Open
Abstract
A large number of clinical studies have shown that interleukin-18 (IL-18) plasma levels are positively correlated with the pathogenesis and development of type 2 diabetes mellitus (T2DM), but it remains unclear whether IL-18 causes T2DM, primarily due to the influence of reverse causality and residual confounding factors. Genome-wide association studies have led to the discovery of numerous common variants associated with IL-18 and T2DM and opened unprecedented opportunities for investigating possible associations between genetic traits and diseases. In this study, we employed a two-sample Mendelian randomization (MR) method to analyze the causal relationships between IL-18 plasma levels and T2DM using IL18-related SNPs as genetic instrumental variables (IVs). We first selected eight SNPs that were significantly associated with IL-18 but independent of T2DM. We then used these SNPs as IVs to evaluate their effects on T2DM using the inverse-variance weighted (IVW) method. Finally, we conducted sensitivity analysis and MR-Egger regression analysis to evaluate the heterogeneity and pleiotropic effects of each variant. The results based on the IVW method demonstrate that high IL-18 plasma levels significantly increase the risk of T2DM, and no heterogeneity or pleiotropic effects appeared after the sensitivity and MR-Egger analyses.
Collapse
Affiliation(s)
- He Zhuang
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shu-Lin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, China.,Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
19
|
Jiang L, Xiao Y, Ding Y, Tang J, Guo F. Discovering Cancer Subtypes via an Accurate Fusion Strategy on Multiple Profile Data. Front Genet 2019; 10:20. [PMID: 30804977 PMCID: PMC6370730 DOI: 10.3389/fgene.2019.00020] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 01/15/2019] [Indexed: 01/03/2023] Open
Abstract
Discovering cancer subtypes is useful for guiding clinical treatment of multiple cancers. Progressive profile technologies for tissue have accumulated diverse types of data. Based on these types of expression data, various computational methods have been proposed to predict cancer subtypes. It is crucial to study how to better integrate these multiple profiles of data. In this paper, we collect multiple profiles of data for five cancers on The Cancer Genome Atlas (TCGA). Then, we construct three similarity kernels for all patients of the same cancer by gene expression, miRNA expression and isoform expression data. We also propose a novel unsupervised multiple kernel fusion method, Similarity Kernel Fusion (SKF), in order to integrate three similarity kernels into one combined kernel. Finally, we make use of spectral clustering on the integrated kernel to predict cancer subtypes. In the experimental results, the P-values from the Cox regression model and survival curve analysis can be used to evaluate the performance of predicted subtypes on three datasets. Our kernel fusion method, SKF, has outstanding performance compared with single kernel and other multiple kernel fusion strategies. It demonstrates that our method can accurately identify more accurate subtypes on various kinds of cancers. Our cancer subtype prediction method can identify essential genes and biomarkers for disease diagnosis and prognosis, and we also discuss the possible side effects of therapies and treatment.
Collapse
Affiliation(s)
- Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yongkang Xiao
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|