51
|
Duan Z, Qiao Y, Lu J, Lu H, Zhang W, Yan F, Sun C, Hu Z, Zhang Z, Li G, Chen H, Xiang Z, Zhu Z, Zhao H, Yu Y, Wei C. HUPAN: a pan-genome analysis pipeline for human genomes. Genome Biol 2019; 20:149. [PMID: 31366358 PMCID: PMC6670167 DOI: 10.1186/s13059-019-1751-y] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 07/01/2019] [Indexed: 12/13/2022] Open
Abstract
The human reference genome is still incomplete, especially for those population-specific or individual-specific regions, which may have important functions. Here, we developed a HUman Pan-genome ANalysis (HUPAN) system to build the human pan-genome. We applied it to 185 deep sequencing and 90 assembled Han Chinese genomes and detected 29.5 Mb novel genomic sequences and at least 188 novel protein-coding genes missing in the human reference genome (GRCh38). It can be an important resource for the human genome-related biomedical studies, such as cancer genome analysis. HUPAN is freely available at http://cgm.sjtu.edu.cn/hupan/ and https://github.com/SJTU-CGM/HUPAN .
Collapse
Affiliation(s)
- Zhongqu Duan
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Yuyang Qiao
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Jinyuan Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Huimin Lu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Wenmin Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Fazhe Yan
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Chen Sun
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zhiqiang Hu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
| | - Zhen Zhang
- Department of Radiation Oncology and Department of Oncology, Shanghai Medical College, Fudan University Shanghai Cancer Center, 270 Dong An Road, Shanghai, 200032, China
| | - Guichao Li
- Department of Radiation Oncology and Department of Oncology, Shanghai Medical College, Fudan University Shanghai Cancer Center, 270 Dong An Road, Shanghai, 200032, China
| | - Hongzhuan Chen
- Department of Pharmacology, Shanghai Key Laboratory For Translational Medicine, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, Shanghai, 200025, China
| | - Zhen Xiang
- Department of Surgery, Ruijin Hospital, Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Road, Shanghai, 200025, China
| | - Zhenggang Zhu
- Department of Surgery, Ruijin Hospital, Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Road, Shanghai, 200025, China
| | - Hongyu Zhao
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China
- Department of Biostatistics, Yale University, 60 College Street, New Haven, CT, 06520, USA
| | - Yingyan Yu
- Department of Surgery, Ruijin Hospital, Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, 197 Ruijin Road, Shanghai, 200025, China.
| | - Chaochun Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
- SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
- Shanghai Center for Bioinformation Technology, 1278 Keyuan Road, Pudong District, Shanghai, 201203, China.
| |
Collapse
|
52
|
Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. SCIENCE CHINA-LIFE SCIENCES 2019; 63:750-763. [PMID: 31290097 DOI: 10.1007/s11427-019-9551-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 04/03/2019] [Indexed: 01/23/2023]
Abstract
Pigs were domesticated independently in the Near East and China, indicating that a single reference genome from one individual is unable to represent the full spectrum of divergent sequences in pigs worldwide. Therefore, 12 de novo pig assemblies from Eurasia were compared in this study to identify the missing sequences from the reference genome. As a result, 72.5 Mb of non-redundant sequences (∼3% of the genome) were found to be absent from the reference genome (Sscrofa11.1) and were defined as pan-sequences. Of the pan-sequences, 9.0 Mb were dominant in Chinese pigs, in contrast with their low frequency in European pigs. One sequence dominant in Chinese pigs contained the complete genic region of the tazarotene-induced gene 3 (TIG3) gene which is involved in fatty acid metabolism. Using flanking sequences and Hi-C based methods, 27.7% of the sequences could be anchored to the reference genome. The supplementation of these sequences could contribute to the accurate interpretation of the 3D chromatin structure. A web-based pan-genome database was further provided to serve as a primary resource for exploration of genetic diversity and promote pig breeding and biomedical research.
Collapse
|
53
|
Zhang C, Bao C, Zhang X, Lin X, Pan D, Chen Y. Knockdown of lncRNA LEF1-AS1 inhibited the progression of oral squamous cell carcinoma (OSCC) via Hippo signaling pathway. Cancer Biol Ther 2019; 20:1213-1222. [PMID: 30983488 DOI: 10.1080/15384047.2019.1599671] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
It is verified that long non-coding RNAs (lncRNAs) play crucial roles in various cancers. LncRNA LEF1-AS1 is a reported oncogene in colorectal cancer and glioblastoma. In this study, we unveiled that LEF1-AS1 markedly increased in oral squamous cell carcinoma (OSCC) tissues and cell lines. Besides, OSCC patients with high levels of LEF1-AS1 were apt to poor prognosis. Functionally, LEF1-AS1 knockdown inhibited cell survival, proliferation and migration, whereas enhanced cell apoptosis and induced G0/G1 cell cycle arrest in vitro. Consistently, LEF1-AS1 silence hindered tumor growth in vivo. Moreover, LEF1-AS1 inhibition stimulated the activation of Hippo signaling pathway through directly interacting with LATS1. Furtherly, we disclosed that LEF1-AS1 silence abolished the interaction of LEF1-AS1 with LATS1 while enhanced the binding of LATS1 to MOB, therefore promoting YAP phosphorylation but impairing YAP1 nuclear translocation. Additionally, we demonstrated that LEF1-AS1 regulated YAP1 translocation via a LATS1-dependent manner. Furthermore, we also uncovered that YAP1 overexpression abolished the suppressive impact of LEF1-AS1 repression on the biological processes of OSCC cells. In a word, we concluded that LEF1-AS1 served an oncogenic part in OSCC through suppressing Hippo signaling pathway by interacting with LATS1, suggesting the therapeutic and prognostic potential of LEF1-AS1 in OSCC.
Collapse
Affiliation(s)
- Chanqiong Zhang
- Department of Pathology, Wenzhou People's Hospital , Wenzhou , Zhejiang , China
| | - Chunchun Bao
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| | - Xiuxing Zhang
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| | - Xinshi Lin
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| | - Dan Pan
- Department of Pathology, Wenzhou People's Hospital , Wenzhou , Zhejiang , China
| | - Yangzong Chen
- Division of PET/CT, Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University , Wenzhou , Zhejiang , China
| |
Collapse
|
54
|
Genome maps across 26 human populations reveal population-specific patterns of structural variation. Nat Commun 2019; 10:1025. [PMID: 30833565 PMCID: PMC6399254 DOI: 10.1038/s41467-019-08992-7] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 02/12/2019] [Indexed: 01/10/2023] Open
Abstract
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome. Large structural variants (SV) are understudied in human genetics research because of the difficulty to detect them in the routinely generated short-read sequencing data. Here, the authors generate optical genome maps of 154 individuals from 26 populations that allow comprehensive examination of large SVs.
Collapse
|
55
|
Rigau M, Juan D, Valencia A, Rico D. Intronic CNVs and gene expression variation in human populations. PLoS Genet 2019; 15:e1007902. [PMID: 30677042 PMCID: PMC6345438 DOI: 10.1371/journal.pgen.1007902] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/17/2018] [Indexed: 11/19/2022] Open
Abstract
Introns can be extraordinarily large and they account for the majority of the DNA sequence in human genes. However, little is known about their population patterns of structural variation and their functional implication. By combining the most extensive maps of CNVs in human populations, we have found that intronic losses are the most frequent copy number variants (CNVs) in protein-coding genes in human, with 12,986 intronic deletions, affecting 4,147 genes (including 1,154 essential genes and 1,638 disease-related genes). This intronic length variation results in dozens of genes showing extreme population variability in size, with 40 genes with 10 or more different sizes and up to 150 allelic sizes. Intronic losses are frequent in evolutionarily ancient genes that are highly conserved at the protein sequence level. This result contrasts with losses overlapping exons, which are observed less often than expected by chance and almost exclusively affect primate-specific genes. An integrated analysis of CNVs and RNA-seq data showed that intronic loss can be associated with significant differences in gene expression levels in the population (CNV-eQTLs). These intronic CNV-eQTLs regions are enriched for intronic enhancers and can be associated with expression differences of other genes showing long distance intron-promoter 3D interactions. Our data suggests that intronic structural variation of protein-coding genes makes an important contribution to the variability of gene expression and splicing in human populations.
Collapse
Affiliation(s)
- Maria Rigau
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - David Juan
- Institut de Biologia Evolutiva, Consejo Superior de Investigaciones Científicas–Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Daniel Rico
- Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
56
|
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data. Genes (Basel) 2018; 9:genes9100486. [PMID: 30304863 PMCID: PMC6210158 DOI: 10.3390/genes9100486] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 09/21/2018] [Accepted: 10/05/2018] [Indexed: 12/16/2022] Open
Abstract
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.
Collapse
|