1
|
Perini S, Johannesson K, Butlin RK, Westram AM. Short INDELs and SNPs as markers of evolutionary processes in hybrid zones. J Evol Biol 2025; 38:367-378. [PMID: 39803902 DOI: 10.1093/jeb/voaf002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 10/28/2024] [Accepted: 01/11/2025] [Indexed: 03/06/2025]
Abstract
Polymorphic short insertions and deletions (INDELs ≤ 50 bp) are abundant, although less common than single nucleotide polymorphisms (SNPs). Evidence from model organisms shows INDELs to be more strongly influenced by purifying selection than SNPs. Partly for this reason, INDELs are rarely used as markers for demographic processes or to detect divergent selection. Here, we compared INDELs and SNPs in the intertidal snail Littorina saxatilis, focussing on hybrid zones between ecotypes, in order to test the utility of INDELs in the detection of divergent selection. We computed INDEL and SNP site frequency spectra using capture sequencing data. We assessed the impact of divergent selection by analyzing allele frequency clines across habitat boundaries. We also examined the influence of GC-biased gene conversion because it may be confounded with signatures of selection. We show evidence that short INDELs are affected more by purifying selection than SNPs, but part of the observed site frequency spectra difference can be attributed to GC-biased gene conversion. We did not find a difference in the impact of divergent selection between short INDELs and SNPs. Short INDELs and SNPs were similarly distributed across the genome and so are likely to respond to indirect selection in the same way. A few regions likely affected by divergent selection were revealed by INDELs and not by SNPs. Short INDELs can be useful (additional) genetic markers helping to identify genomic regions important for adaptation and population divergence.
Collapse
Affiliation(s)
- Samuel Perini
- Department of Marine Sciences, University of Gothenburg, Tjärnö Marine Laboratory, Strömstad, Sweden
| | - Kerstin Johannesson
- Department of Marine Sciences, University of Gothenburg, Tjärnö Marine Laboratory, Strömstad, Sweden
| | - Roger K Butlin
- Department of Marine Sciences, University of Gothenburg, Tjärnö Marine Laboratory, Strömstad, Sweden
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Sheffield, United Kingdom
| | - Anja M Westram
- ISTA (Institute of Science and Technology Austria), Klosterneuburg, Austria
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| |
Collapse
|
2
|
Márquez‐Corro JI, Martín‐Bravo S, Blanco‐Pastor JL, Luceño M, Escudero M. The holocentric chromosome microevolution: From phylogeographic patterns to genomic associations with environmental gradients. Mol Ecol 2024; 33:e17156. [PMID: 37795678 PMCID: PMC11628669 DOI: 10.1111/mec.17156] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/19/2023] [Accepted: 09/22/2023] [Indexed: 10/06/2023]
Abstract
Geographic isolation and chromosome evolution are two of the major drivers of diversification in eukaryotes in general, and specifically, in plants. On one hand, range shifts induced by Pleistocene glacial oscillations deeply shaped the evolutionary trajectories of species in the Northern Hemisphere. On the other hand, karyotype variability within species or species complexes may have adaptive potential as different karyotypes may represent different recombination rates and linkage groups that may be associated with locally adapted genes or supergenes. Organisms with holocentric chromosomes are ideal to study the link between local adaptation and chromosome evolution, due to their high cytogenetic variability, especially when it seems to be related to environmental variation. Here, we integrate the study of the phylogeography, chromosomal evolution and ecological requirements of a plant species complex distributed in the Western Euro-Mediterranean region (Carex gr. laevigata, Cyperaceae). We aim to clarify the relative influence of these factors on population differentiation and ultimately on speciation. We obtained a well-resolved RADseq phylogeny that sheds light on the phylogeographic patterns of molecular and chromosome number variation, which are compatible with south-to-north postglacial migration. In addition, landscape genomics analyses identified candidate loci for local adaptation, and also strong significant associations between the karyotype and the environment. We conclude that karyotype distribution in C. gr. laevigata has been constrained by both range shift dynamics and local adaptation. Our study demonstrates that chromosome evolution may be responsible, at least partially, for microevolutionary patterns of population differentiation and adaptation in Carex.
Collapse
Affiliation(s)
- José Ignacio Márquez‐Corro
- Departamento de Biología Molecular e Ingeniería BioquímicaUniversidad Pablo de OlavideSevilleSpain
- Jodrell Laboratory, Department of Trait Diversity and FunctionRoyal Botanic Gardens, KewRichmondUK
| | - Santiago Martín‐Bravo
- Departamento de Biología Molecular e Ingeniería BioquímicaUniversidad Pablo de OlavideSevilleSpain
| | - José Luis Blanco‐Pastor
- Departamento de Biología Vegetal y EcologíaUniversidad de SevillaSevilleSpain
- Departamento de Biología, IVAGROUniversidad de Cádiz, Campus de Excelencia Internacional Agroalimentario (CeiA3)CádizSpain
| | - Modesto Luceño
- Departamento de Biología Molecular e Ingeniería BioquímicaUniversidad Pablo de OlavideSevilleSpain
| | - Marcial Escudero
- Departamento de Biología Vegetal y EcologíaUniversidad de SevillaSevilleSpain
| |
Collapse
|
3
|
Zheng Y, Lin C, Wang WJ, Wang L, Qian Y, Mao L, Li B, Lou L, Mao Y, Li N, Zheng J, Jiang N, He C, Wang Q, Zhou Q, Chen F, Jin F. Post-implantation analysis of genomic variations in the progeny from developing fetus to birth. Hum Genomics 2024; 18:79. [PMID: 39010135 PMCID: PMC11247737 DOI: 10.1186/s40246-024-00634-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/06/2024] [Indexed: 07/17/2024] Open
Abstract
The analysis of genomic variations in offspring after implantation has been infrequently studied. In this study, we aim to investigate the extent of de novo mutations in humans from developing fetus to birth. Using high-depth whole-genome sequencing, 443 parent-offspring trios were studied to compare the results of de novo mutations (DNMs) between different groups. The focus was on fetuses and newborns, with DNA samples obtained from the families' blood and the aspirated embryonic tissues subjected to deep sequencing. It was observed that the average number of total DNMs in the newborns group was 56.26 (54.17-58.35), which appeared to be lower than that the multifetal reduction group, which was 76.05 (69.70-82.40) (F = 2.42, P = 0.12). However, after adjusting for parental age and maternal pre-pregnancy body mass index (BMI), significant differences were found between the two groups. The analysis was further divided into single nucleotide variants (SNVs) and insertion/deletion of a small number of bases (indels), and it was discovered that the average number of de novo SNVs associated with the multifetal reduction group and the newborn group was 49.89 (45.59-54.20) and 51.09 (49.22-52.96), respectively. No significant differences were noted between the groups (F = 1.01, P = 0.32). However, a significant difference was observed for de novo indels, with a higher average number found in the multifetal reduction group compared to the newborn group (F = 194.17, P < 0.001). The average number of de novo indels among the multifetal reduction group and the newborn group was 26.26 (23.27-29.05) and 5.17 (4.82-5.52), respectively. To conclude, it has been observed that the quantity of de novo indels in the newborns experiences a significant decrease when compared to that in the aspirated embryonic tissues (7-9 weeks). This phenomenon is evident across all genomic regions, highlighting the adverse effects of de novo indels on the fetus and emphasizing the significance of embryonic implantation and intrauterine growth in human genetic selection mechanisms.
Collapse
Affiliation(s)
- Yingming Zheng
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Chuanping Lin
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
- Reproductive Medical Center, the Second Affiliated Hospital of Wenzhou Medical College and Yuying Children's hospital, Wenzhou, Zhejiang, 325027, China
| | | | - Liya Wang
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Yeqing Qian
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Luna Mao
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Baohua Li
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Lijun Lou
- Affiliated Dongyang Hospital of Wenzhou Medical University, Dongyang, Zhejiang, 322100, China
| | - Yuchan Mao
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Na Li
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Jiayong Zheng
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Nan Jiang
- Reproductive Medical Center, the First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310003, China
| | - Chaying He
- Hangzhou Women's Hospital (Hangzhou Maternity and Child Health Care Hospital), Hangzhou, Zhejiang, 310008, China
| | - Qijing Wang
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China
| | - Qing Zhou
- BGI Research, Shenzhen, Guangdong, 518083, China
| | - Fang Chen
- BGI Research, Shenzhen, Guangdong, 518083, China
| | - Fan Jin
- Department of Reproductive Endocrinology, Key Laboratory of Reproductive Genetics of National Ministry of Education, Women's Reproductive Health Laboratory of Zhejiang Province, Women's Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Hangzhou, Zhejiang, 310006, China.
| |
Collapse
|
4
|
Ma Y, Li Y, Sun J, Liang Q, Wu R, Ding Q, Dai J. Complete F9 Gene Deletion, Duplication, and Triplication Rearrangements: Implications for Factor IX Expression and Clinical Phenotypes. Thromb Haemost 2024; 124:374-385. [PMID: 38011862 DOI: 10.1055/a-2217-9837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
BACKGROUND Factor IX (FIX) plays a critical role in blood coagulation. Complete deletion of F9 results in severe hemophilia B, whereas the clinical implications of complete F9 duplication and triplication remain understudied. OBJECTIVE To investigate the rearrangement mechanisms underlying complete F9 deletion (cases 1 and 2), duplication (cases 3 and 4), and triplication (case 5), and to explore their association with FIX expression levels and clinical impacts. METHODS Plasma FIX levels were detected using antigen and activity assays. CNVplex technology, optical genome mapping, and long-distance polymerase chain reaction were employed to characterize the breakpoints of the chromosomal rearrangements. RESULTS Cases 1 and 2 exhibited FIX activities below 1%. Case 3 displayed FIX activities within the reference range. However, cases 4 and 5 showed a significant increase in FIX activities. Alu-mediated nonallelic homologous recombination was identified as the cause of F9 deletion in case 1; FoSTeS/MMBIR (Fork Stalling and Template Switching/microhomology-mediated break-induced replication) contributed to both F9 deletion and tandem duplication observed in cases 2 and 3; BIR/MMBIR (break-induced replication/microhomology-mediated break-induced replication) mediated by the same pair of low-copy repeats results in similar duplication-triplication/inversion-duplication (DUP-TRP/INV-DUP) rearrangements in cases 4 and 5, leading to complete F9 duplication and triplication, respectively. CONCLUSION Large deletions involving the F9 gene exhibit no apparent pattern, and the extra-hematologic clinical phenotypes require careful analysis of other genes within the deletion. The impact of complete F9 duplication and triplication on FIX expression might depend on the integrity of the F9 upstream sequence and the specific rearrangement mechanisms. Notably, DUP-TRP/INV-DUP rearrangements significantly elevate FIX activity and are closely associated with thrombotic phenotypes.
Collapse
Affiliation(s)
- YuXin Ma
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Department of Laboratory Medicine, Shanghai Jiaotong University School of Medicine, Ruijin Hospital, Shanghai, China
| | - Yang Li
- Department of Laboratory Medicine, Shanghai Jiaotong University School of Medicine, Ruijin Hospital, Shanghai, China
| | - Jie Sun
- Haemophilia Comprehensive Care Center, Capital Medical University, Beijing Children's Hospital, Beijing, China
| | - Qian Liang
- Department of Laboratory Medicine, Shanghai Jiaotong University School of Medicine, Ruijin Hospital, Shanghai, China
| | - Runhui Wu
- Haemophilia Comprehensive Care Center, Capital Medical University, Beijing Children's Hospital, Beijing, China
| | - Qiulan Ding
- Department of Laboratory Medicine, Shanghai Jiaotong University School of Medicine, Ruijin Hospital, Shanghai, China
- Collaborative Innovation Center of Hematology, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Jing Dai
- Department of Laboratory Medicine, Shanghai Jiaotong University School of Medicine, Ruijin Hospital, Shanghai, China
- Department of Laboratory Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
5
|
Maroilley T, Li X, Oldach M, Jean F, Stasiuk SJ, Tarailo-Graovac M. Deciphering complex genome rearrangements in C. elegans using short-read whole genome sequencing. Sci Rep 2021; 11:18258. [PMID: 34521941 PMCID: PMC8440550 DOI: 10.1038/s41598-021-97764-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 08/30/2021] [Indexed: 12/14/2022] Open
Abstract
Genomic rearrangements cause congenital disorders, cancer, and complex diseases in human. Yet, they are still understudied in rare diseases because their detection is challenging, despite the advent of whole genome sequencing (WGS) technologies. Short-read (srWGS) and long-read WGS approaches are regularly compared, and the latter is commonly recommended in studies focusing on genomic rearrangements. However, srWGS is currently the most economical, accurate, and widely supported technology. In Caenorhabditis elegans (C. elegans), such variants, induced by various mutagenesis processes, have been used for decades to balance large genomic regions by preventing chromosomal crossover events and allowing the maintenance of lethal mutations. Interestingly, those chromosomal rearrangements have rarely been characterized on a molecular level. To evaluate the ability of srWGS to detect various types of complex genomic rearrangements, we sequenced three balancer strains using short-read Illumina technology. As we experimentally validated the breakpoints uncovered by srWGS, we showed that, by combining several types of analyses, srWGS enables the detection of a reciprocal translocation (eT1), a free duplication (sDp3), a large deletion (sC4), and chromoanagenesis events. Thus, applying srWGS to decipher real complex genomic rearrangements in model organisms may help designing efficient bioinformatics pipelines with systematic detection of complex rearrangements in human genomes.
Collapse
Affiliation(s)
- Tatiana Maroilley
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Xiao Li
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Matthew Oldach
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Francesca Jean
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Susan J Stasiuk
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada.,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - Maja Tarailo-Graovac
- Departments of Biochemistry, Molecular Biology and Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada. .,Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, T2N 4N1, Canada.
| |
Collapse
|
6
|
Russell LE, Zhou Y, Almousa AA, Sodhi JK, Nwabufo CK, Lauschke VM. Pharmacogenomics in the era of next generation sequencing - from byte to bedside. Drug Metab Rev 2021; 53:253-278. [PMID: 33820459 DOI: 10.1080/03602532.2021.1909613] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Pharmacogenetic research has resulted in the identification of a multitude of genetic variants that impact drug response or toxicity. These polymorphisms are mostly common and have been included as actionable information in the labels of numerous drugs. In addition to common variants, recent advances in Next Generation Sequencing (NGS) technologies have resulted in the identification of a plethora of rare and population-specific pharmacogenetic variations with unclear functional consequences that are not accessible by conventional forward genetics strategies. In this review, we discuss how comprehensive sequencing information can be translated into personalized pharmacogenomic advice in the age of NGS. Specifically, we provide an update of the functional impacts of rare pharmacogenetic variability and how this information can be leveraged to improve pharmacogenetic guidance. Furthermore, we critically discuss the current status of implementation of pharmacogenetic testing across drug development and layers of care. We identify major gaps and provide perspectives on how these can be minimized to optimize the utilization of NGS data for personalized clinical decision-support.
Collapse
Affiliation(s)
| | - Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Ahmed A Almousa
- Department of Pharmacy, London Health Sciences Center, Victoria Hospital, London, ON, Canada
| | - Jasleen K Sodhi
- Department of Bioengineering and Therapeutic Sciences, Schools of Pharmacy and Medicine, University of California San Francisco, San Francisco, CA, USA.,Department of Drug Metabolism and Pharmacokinetics, Plexxikon, Inc., Berkeley, CA, USA
| | | | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
7
|
Identifying NAHR mechanism between two distinct Alu elements through breakpoint junction mapping by NGS. Meta Gene 2020. [DOI: 10.1016/j.mgene.2020.100702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
8
|
Zhang L, Zhou X, Weng Z, Sidow A. De novo diploid genome assembly for genome-wide structural variant detection. NAR Genom Bioinform 2019; 2:lqz018. [PMID: 33575568 PMCID: PMC7671403 DOI: 10.1093/nargab/lqz018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 10/09/2019] [Accepted: 12/02/2019] [Indexed: 12/30/2022] Open
Abstract
Detection of structural variants (SVs) on the basis of read alignment to a reference genome remains a difficult problem. De novo assembly, traditionally used to generate reference genomes, offers an alternative for SV detection. However, it has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10× linked-read sequencing supports accurate SV detection. We examined variants in six de novo 10× assemblies with diverse experimental parameters from two commonly used human cell lines: NA12878 and NA24385. The assemblies are effective for detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies’ contigs to the reference (hg38). Our study also shows that the base-pair level SV breakpoint accuracy is high, with a majority of SVs having precisely correct sizes and breakpoints. Setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation. In about half of cases, the mechanism is the opposite of the reference-based call. We uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10× linked-read data can achieve cost-effective SV detection for personal genomes.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong.,Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ziming Weng
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA.,Department of Genetics, 300 Pasteur Dr, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
9
|
Wheeler NJ, Dinguirard N, Marquez J, Gonzalez A, Zamanian M, Yoshino TP, Castillo MG. Sequence and structural variation in the genome of the Biomphalaria glabrata embryonic (Bge) cell line. Parasit Vectors 2018; 11:496. [PMID: 30180879 PMCID: PMC6122571 DOI: 10.1186/s13071-018-3059-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 08/13/2018] [Indexed: 12/15/2022] Open
Abstract
Background The aquatic pulmonate snail Biomphalaria glabrata is a significant vector and laboratory host for the parasitic flatworm Schistosoma mansoni, an etiological agent for the neglected tropical disease schistosomiasis. Much is known regarding the host-parasite interactions of these two organisms, and the B. glabrata embryonic (Bge) cell line has been an invaluable resource in these studies. The B. glabrata BB02 genome sequence was recently released, but nothing is known of the sequence variation between this reference and the Bge cell genome, which has likely accumulated substantial genetic variation in the ~50 years since its isolation. Results Here, we report the genome sequence of our laboratory subculture of the Bge cell line (designated Bge3), which we mapped to the B. glabrata BB02 reference genome. Single nucleotide variants (SNVs) were predicted and focus was given to those SNVs that are most likely to affect the structure or expression of protein-coding genes. Furthermore, we have highlighted and validated high-impact SNVs in genes that have often been studied using Bge cells as an in vitro model, and other genes that may have contributed to the immortalization of this cell line. We also resolved representative karyotypes for the Bge3 subculture, which revealed a mixed population exhibiting substantial aneuploidy, in line with previous reports from other Bge subcultures. Conclusions The Bge3 genome differs from the B. glabrata BB02 reference genome in both sequence and structure, and these are likely to have significant biological effects. The availability of the Bge3 genome sequence, and an awareness of genomic differences with B. glabrata, will inform the design of experiments to understand gene function in this unique in vitro snail cell model. Additionally, this resource will aid in the development of new technologies and molecular approaches that promise to reveal more about this schistosomiasis-transmitting snail vector. Electronic supplementary material The online version of this article (10.1186/s13071-018-3059-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nicolas J Wheeler
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin, Madison, WI, USA
| | - Nathalie Dinguirard
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin, Madison, WI, USA
| | - Joshua Marquez
- Department of Biology, New Mexico State University, Las Cruces, NM, USA
| | - Adrian Gonzalez
- Department of Biology, New Mexico State University, Las Cruces, NM, USA
| | - Mostafa Zamanian
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin, Madison, WI, USA
| | - Timothy P Yoshino
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin, Madison, WI, USA
| | - Maria G Castillo
- Department of Biology, New Mexico State University, Las Cruces, NM, USA.
| |
Collapse
|
10
|
Vicente-Salvador D, Puig M, Gayà-Vidal M, Pacheco S, Giner-Delgado C, Noguera I, Izquierdo D, Martínez-Fundichely A, Ruiz-Herrera A, Estivill X, Aguado C, Lucas-Lledó JI, Cáceres M. Detailed analysis of inversions predicted between two human genomes: errors, real polymorphisms, and their origin and population distribution. Hum Mol Genet 2017; 26:567-581. [PMID: 28025331 DOI: 10.1093/hmg/ddw415] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/05/2016] [Indexed: 12/11/2022] Open
Abstract
The growing catalogue of structural variants in humans often overlooks inversions as one of the most difficult types of variation to study, even though they affect phenotypic traits in diverse organisms. Here, we have analysed in detail 90 inversions predicted from the comparison of two independently assembled human genomes: the reference genome (NCBI36/HG18) and HuRef. Surprisingly, we found that two thirds of these predictions (62) represent errors either in assembly comparison or in one of the assemblies, including 27 misassembled regions in HG18. Next, we validated 22 of the remaining 28 potential polymorphic inversions using different PCR techniques and characterized their breakpoints and ancestral state. In addition, we determined experimentally the derived allele frequency in Europeans for 17 inversions (DAF = 0.01-0.80), as well as the distribution in 14 worldwide populations for 12 of them based on the 1000 Genomes Project data. Among the validated inversions, nine have inverted repeats (IRs) at their breakpoints, and two show nucleotide variation patterns consistent with a recurrent origin. Conversely, inversions without IRs have a unique origin and almost all of them show deletions or insertions at the breakpoints in the derived allele mediated by microhomology sequences, which highlights the importance of mechanisms like FoSTeS/MMBIR in the generation of complex rearrangements in the human genome. Finally, we found several inversions located within genes and at least one candidate to be positively selected in Africa. Thus, our study emphasizes the importance of careful analysis and validation of large-scale genomic predictions to extract reliable biological conclusions.
Collapse
Affiliation(s)
- David Vicente-Salvador
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Marta Puig
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Magdalena Gayà-Vidal
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Sarai Pacheco
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Carla Giner-Delgado
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, (Barcelona), Spain
| | - Isaac Noguera
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - David Izquierdo
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | | | - Aurora Ruiz-Herrera
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,Departament de Biologia Celular, Fisiologia i Immunologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Xavier Estivill
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain.,Department of Experimental and Health Sciences, Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Cristina Aguado
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - José Ignacio Lucas-Lledó
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Paterna (València), Spain and
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,ICREA, Barcelona, Spain
| |
Collapse
|
11
|
Cherukuri Y, Janga SC. Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches. BMC Genomics 2016; 17 Suppl 7:507. [PMID: 27556636 PMCID: PMC5001211 DOI: 10.1186/s12864-016-2895-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Improved DNA sequencing methods have transformed the field of genomics over the last decade. This has become possible due to the development of inexpensive short read sequencing technologies which have now resulted in three generations of sequencing platforms. More recently, a new fourth generation of Nanopore based single molecule sequencing technology, was developed based on MinION® sequencer which is portable, inexpensive and fast. It is capable of generating reads of length greater than 100 kb. Though it has many specific advantages, the two major limitations of the MinION reads are high error rates and the need for the development of downstream pipelines. The algorithms for error correction have already emerged, while development of pipelines is still at nascent stage. Results In this study, we benchmarked available assembler algorithms to find an appropriate framework that can efficiently assemble Nanopore sequenced reads. To address this, we employed genome-scale Nanopore sequenced datasets available for E. coli and yeast genomes respectively. In order to comprehensively evaluate multiple algorithmic frameworks, we included assemblers based on de Bruijn graphs (Velvet and ABySS), Overlap Layout Consensus (OLC) (Celera) and Greedy extension (SSAKE) approaches. We analyzed the quality, accuracy of the assemblies as well as the computational performance of each of the assemblers included in our benchmark. Our analysis unveiled that OLC-based algorithm, Celera, could generate a high quality assembly with ten times higher N50 & mean contig values as well as one-fifth the number of total number of contigs compared to other tools. Celera was also found to exhibit an average genome coverage of 12 % in E. coli dataset and 70 % in Yeast dataset as well as relatively lesser run times. In contrast, de Bruijn graph based assemblers Velvet and ABySS generated the assemblies of moderate quality, in less time when there is no limitation on the memory allocation, while greedy extension based algorithm SSAKE generated an assembly of very poor quality but with genome coverage of 90 % on yeast dataset. Conclusion OLC can be considered as a favorable algorithmic framework for the development of assembler tools for Nanopore-based data, followed by de Bruijn based algorithms as they consume relatively less or similar run times as OLC-based algorithms for generating assembly, irrespective of the memory allocated for the task. However, few improvements must be made to the existing de Bruijn implementations in order to generate an assembly with reasonable quality. Our findings should help in stimulating the development of novel assemblers for handling Nanopore sequence data. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2895-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yesesri Cherukuri
- Department of Bio Health Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IA, 46202, USA
| | - Sarath Chandra Janga
- Department of Bio Health Informatics, School of Informatics and Computing, Indiana University Purdue University, 719 Indiana Ave Ste 319, Walker Plaza Building, Indianapolis, IA, 46202, USA. .,Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), 410 West 10th Street, Indianapolis, IA, 46202, USA. .,Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, 975 West Walnut Street, Indianapolis, IA, 46202, USA.
| |
Collapse
|
12
|
Next-generation sequencing-based detection of germline L1-mediated transductions. BMC Genomics 2016; 17:342. [PMID: 27161561 PMCID: PMC4862182 DOI: 10.1186/s12864-016-2670-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 04/26/2016] [Indexed: 01/01/2023] Open
Abstract
Background While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference transductions in massively-parallel sequencing data has been lacking. Results Here we present the computational approach TIGER (Transduction Inference in GERmline genomes), enabling the discovery of non-reference L1-mediated transductions by combining L1 discovery with detection of unique insertion sequences and detailed characterization of insertion sites. We employed TIGER to characterize polymorphic transductions in fifteen genomes from non-human primate species (chimpanzee, orangutan and rhesus macaque), as well as in a human genome. We achieved high accuracy as confirmed by PCR and two single molecule DNA sequencing techniques, and uncovered differences in relative rates of transduction between primate species. Conclusions By enabling detection of polymorphic transductions, TIGER makes this form of relevant structural variation amenable for population and personal genome analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2670-x) contains supplementary material, which is available to authorized users.
Collapse
|
13
|
Kronenberg ZN, Osborne EJ, Cone KR, Kennedy BJ, Domyan ET, Shapiro MD, Elde NC, Yandell M. Wham: Identifying Structural Variants of Biological Consequence. PLoS Comput Biol 2015; 11:e1004572. [PMID: 26625158 PMCID: PMC4666669 DOI: 10.1371/journal.pcbi.1004572] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 09/30/2015] [Indexed: 11/22/2022] Open
Abstract
Existing methods for identifying structural variants (SVs) from short read datasets are inaccurate. This complicates disease-gene identification and efforts to understand the consequences of genetic variation. In response, we have created Wham (Whole-genome Alignment Metrics) to provide a single, integrated framework for both structural variant calling and association testing, thereby bypassing many of the difficulties that currently frustrate attempts to employ SVs in association testing. Here we describe Wham, benchmark it against three other widely used SV identification tools–Lumpy, Delly and SoftSearch–and demonstrate Wham’s ability to identify and associate SVs with phenotypes using data from humans, domestic pigeons, and vaccinia virus. Wham and all associated software are covered under the MIT License and can be freely downloaded from github (https://github.com/zeeev/wham), with documentation on a wiki (http://zeeev.github.io/wham/). For community support please post questions to https://www.biostars.org/.
Collapse
Affiliation(s)
- Zev N. Kronenberg
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Edward J. Osborne
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, United States of America
| | - Kelsey R. Cone
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Brett J. Kennedy
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, United States of America
| | - Eric T. Domyan
- Department of Biology, University of Utah, Salt Lake City, Utah, United States of America
| | - Michael D. Shapiro
- Department of Biology, University of Utah, Salt Lake City, Utah, United States of America
| | - Nels C. Elde
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Mark Yandell
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| |
Collapse
|
14
|
Abstract
Polymorphic inversions are a type of structural variants that are difficult to analyze owing to their balanced nature and the location of breakpoints within complex repeated regions. So far, only a handful of inversions have been studied in detail in humans and current knowledge about their possible functional effects is still limited. However, inversions have been related to phenotypic changes and adaptation in multiple species. In this review, we summarize the evidences of the functional impact of inversions in the human genome. First, given that inversions have been shown to inhibit recombination in heterokaryotes, chromosomes displaying different orientation are expected to evolve independently and this may lead to distinct gene-expression patterns. Second, inversions have a role as disease-causing mutations both by directly affecting gene structure or regulation in different ways, and by predisposing to other secondary arrangements in the offspring of inversion carriers. Finally, several inversions show signals of being selected during human evolution. These findings illustrate the potential of inversions to have phenotypic consequences also in humans and emphasize the importance of their inclusion in genome-wide association studies.
Collapse
|
15
|
Startek M, Szafranski P, Gambin T, Campbell IM, Hixson P, Shaw CA, Stankiewicz P, Gambin A. Genome-wide analyses of LINE-LINE-mediated nonallelic homologous recombination. Nucleic Acids Res 2015; 43:2188-98. [PMID: 25613453 PMCID: PMC4344489 DOI: 10.1093/nar/gku1394] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Nonallelic homologous recombination (NAHR), occurring between low-copy repeats (LCRs) >10 kb in size and sharing >97% DNA sequence identity, is responsible for the majority of recurrent genomic rearrangements in the human genome. Recent studies have shown that transposable elements (TEs) can also mediate recurrent deletions and translocations, indicating the features of substrates that mediate NAHR may be significantly less stringent than previously believed. Using >4 kb length and >95% sequence identity criteria, we analyzed of the genome-wide distribution of long interspersed element (LINE) retrotransposon and their potential to mediate NAHR. We identified 17 005 directly oriented LINE pairs located <10 Mbp from each other as potential NAHR substrates, placing 82.8% of the human genome at risk of LINE-LINE-mediated instability. Cross-referencing these regions with CNVs in the Baylor College of Medicine clinical chromosomal microarray database of 36 285 patients, we identified 516 CNVs potentially mediated by LINEs. Using long-range PCR of five different genomic regions in a total of 44 patients, we confirmed that the CNV breakpoints in each patient map within the LINE elements. To additionally assess the scale of LINE-LINE/NAHR phenomenon in the human genome, we tested DNA samples from six healthy individuals on a custom aCGH microarray targeting LINE elements predicted to mediate CNVs and identified 25 LINE-LINE rearrangements. Our data indicate that LINE-LINE-mediated NAHR is widespread and under-recognized, and is an important mechanism of structural rearrangement contributing to human genomic variability.
Collapse
Affiliation(s)
- Michał Startek
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 2 Banacha street, 02-097 Warsaw, Poland
| | - Przemyslaw Szafranski
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Tomasz Gambin
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Ian M Campbell
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Patricia Hixson
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Chad A Shaw
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Paweł Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Anna Gambin
- Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw, 2 Banacha street, 02-097 Warsaw, Poland Mossakowski Medical Research Centre, Polish Academy of Sciences, 5 Pawińskiego street, 02-106 Warsaw, Poland
| |
Collapse
|
16
|
Belizário JE. The humankind genome: from genetic diversity to the origin of human diseases. Genome 2014; 56:705-16. [PMID: 24433206 DOI: 10.1139/gen-2013-0125] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease's etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.
Collapse
Affiliation(s)
- Jose E Belizário
- Departamento de Farmacologia, Instituto de Ciências Biomédicas da Universidade de São Paulo, Avenida Lineu Prestes, 1524 CEP 05508-900, São Paulo, SP, Brazil
| |
Collapse
|
17
|
Trappe K, Emde AK, Ehrlich HC, Reinert K. Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone. Bioinformatics 2014; 30:3484-90. [PMID: 25028727 DOI: 10.1093/bioinformatics/btu431] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges of variation and struggle to correctly classify the type and exact size of SVs. RESULTS We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split SV detection tool that detects and classifies deletions, inversions, dispersed duplications and translocations of ≥ 30 bp. Our approach is based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution. We show that Gustaf correctly identifies SVs, especially in the range from 30 to 100 bp, which we call the next-generation sequencing (NGS) twilight zone of SVs, as well as larger SVs >500 bp. Gustaf performs better than similar tools in our benchmark and is furthermore able to correctly identify size and location of dispersed duplications and translocations, which otherwise might be wrongly classified, for example, as large deletions.
Collapse
Affiliation(s)
- Kathrin Trappe
- Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA
| | - Anne-Katrin Emde
- Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA
| | - Hans-Christian Ehrlich
- Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA
| | - Knut Reinert
- Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
18
|
Quek K, Nones K, Patch AM, Fink JL, Newell F, Cloonan N, Miller D, Fadlullah MZH, Kassahn K, Christ AN, Bruxner TJC, Manning S, Harliwong I, Idrisoglu S, Nourse C, Nourbakhsh E, Wani S, Steptoe A, Anderson M, Holmes O, Leonard C, Taylor D, Wood S, Xu Q, Wilson P, Biankin AV, Pearson JV, Waddell N, Grimmond SM. A workflow to increase verification rate of chromosomal structural rearrangements using high-throughput next-generation sequencing. Biotechniques 2014; 57:31-8. [PMID: 25005691 DOI: 10.2144/000114189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 06/12/2014] [Indexed: 11/23/2022] Open
Abstract
Somatic rearrangements, which are commonly found in human cancer genomes, contribute to the progression and maintenance of cancers. Conventionally, the verification of somatic rearrangements comprises many manual steps and Sanger sequencing. This is labor intensive when verifying a large number of rearrangements in a large cohort. To increase the verification throughput, we devised a high-throughput workflow that utilizes benchtop next-generation sequencing and in-house bioinformatics tools to link the laboratory processes. In the proposed workflow, primers are automatically designed. PCR and an optional gel electrophoresis step to confirm the somatic nature of the rearrangements are performed. PCR products of somatic events are pooled for Ion Torrent PGM and/or Illumina MiSeq sequencing, the resulting sequence reads are assembled into consensus contigs by a consensus assembler, and an automated BLAT is used to resolve the breakpoints to base level. We compared sequences and breakpoints of verified somatic rearrangements between the conventional and high-throughput workflow. The results showed that next-generation sequencing methods are comparable to conventional Sanger sequencing. The identified breakpoints obtained from next-generation sequencing methods were highly accurate and reproducible. Furthermore, the proposed workflow allows hundreds of events to be processed in a shorter time frame compared with the conventional workflow.
Collapse
Affiliation(s)
- Kelly Quek
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Katia Nones
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Ann-Marie Patch
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - J Lynn Fink
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Felicity Newell
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Nicole Cloonan
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - David Miller
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Muhammad Z H Fadlullah
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Karin Kassahn
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Angelika N Christ
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Timothy J C Bruxner
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Suzanne Manning
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Ivon Harliwong
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Senel Idrisoglu
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Craig Nourse
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Ehsan Nourbakhsh
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Shivangi Wani
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Anita Steptoe
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Matthew Anderson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Oliver Holmes
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Conrad Leonard
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Darrin Taylor
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Scott Wood
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Qinying Xu
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Peter Wilson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Andrew V Biankin
- The Kinghorn Cancer Centre, Cancer Research Program, Garvan Institute of Medical Research, Sydney, NSW, Australia; Department of Surgery, Bankstown Hospital, Sydney, NSW, Australia; South Western Sydney Clinical School, Faculty of Medicine, University of NSW, Liverpool, NSW, Australia; Wolfson Wohl Cancer Research Centre, Institute for Cancer Sciences, University of Glasgow, Glasgow, Scotland, United Kingdom
| | - John V Pearson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Nic Waddell
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia
| | - Sean M Grimmond
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, QLD, Australia; Wolfson Wohl Cancer Research Centre, Institute for Cancer Sciences, University of Glasgow, Glasgow, Scotland, United Kingdom
| |
Collapse
|
19
|
Huang W, Massouras A, Inoue Y, Peiffer J, Ràmia M, Tarone AM, Turlapati L, Zichner T, Zhu D, Lyman RF, Magwire MM, Blankenburg K, Carbone MA, Chang K, Ellis LL, Fernandez S, Han Y, Highnam G, Hjelmen CE, Jack JR, Javaid M, Jayaseelan J, Kalra D, Lee S, Lewis L, Munidasa M, Ongeri F, Patel S, Perales L, Perez A, Pu L, Rollmann SM, Ruth R, Saada N, Warner C, Williams A, Wu YQ, Yamamoto A, Zhang Y, Zhu Y, Anholt RR, Korbel JO, Mittelman D, Muzny DM, Gibbs RA, Barbadilla A, Johnston JS, Stone EA, Richards S, Deplancke B, Mackay TF. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res 2014; 24:1193-208. [PMID: 24714809 PMCID: PMC4079974 DOI: 10.1101/gr.171546.113] [Citation(s) in RCA: 434] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/01/2014] [Indexed: 12/30/2022]
Abstract
The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.
Collapse
Affiliation(s)
- Wen Huang
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Andreas Massouras
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Yutaka Inoue
- Center for Education in Liberal Arts and Sciences, Osaka University, Osaka-fu, 560-0043 Japan
| | - Jason Peiffer
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Miquel Ràmia
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Aaron M. Tarone
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Lavanya Turlapati
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Thomas Zichner
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Dianhui Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Richard F. Lyman
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Michael M. Magwire
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Kerstin Blankenburg
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Mary Anna Carbone
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Kyle Chang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lisa L. Ellis
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Sonia Fernandez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Gareth Highnam
- Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Carl E. Hjelmen
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - John R. Jack
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Mehwish Javaid
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Joy Jayaseelan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Sandy Lee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lora Lewis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Mala Munidasa
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Fiona Ongeri
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Shohba Patel
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lora Perales
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Agapito Perez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - LingLing Pu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Stephanie M. Rollmann
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Robert Ruth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Nehad Saada
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Crystal Warner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Aneisa Williams
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yuan-Qing Wu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Akihiko Yamamoto
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Yiqing Zhang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Robert R.H. Anholt
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - David Mittelman
- Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Antonio Barbadilla
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - J. Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Eric A. Stone
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Stephen Richards
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Trudy F.C. Mackay
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| |
Collapse
|
20
|
Wijaya E, Shimizu K, Asai K, Hamada M. Reference-free prediction of rearrangement breakpoint reads. ACTA ACUST UNITED AC 2014; 30:2559-67. [PMID: 24876376 DOI: 10.1093/bioinformatics/btu360] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
MOTIVATION Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information. RESULTS In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼ 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome. AVAILABILITY AND IMPLEMENTATION The source code of SlideSort-BPR can be freely downloaded from https://code.google.com/p/slidesort-bpr/.
Collapse
Affiliation(s)
- Edward Wijaya
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kana Shimizu
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kiyoshi Asai
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562 and Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
21
|
Aguado C, Gayà-Vidal M, Villatoro S, Oliva M, Izquierdo D, Giner-Delgado C, Montalvo V, García-González J, Martínez-Fundichely A, Capilla L, Ruiz-Herrera A, Estivill X, Puig M, Cáceres M. Validation and genotyping of multiple human polymorphic inversions mediated by inverted repeats reveals a high degree of recurrence. PLoS Genet 2014; 10:e1004208. [PMID: 24651690 PMCID: PMC3961182 DOI: 10.1371/journal.pgen.1004208] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 01/14/2014] [Indexed: 01/17/2023] Open
Abstract
In recent years different types of structural variants (SVs) have been discovered in the human genome and their functional impact has become increasingly clear. Inversions, however, are poorly characterized and more difficult to study, especially those mediated by inverted repeats or segmental duplications. Here, we describe the results of a simple and fast inverse PCR (iPCR) protocol for high-throughput genotyping of a wide variety of inversions using a small amount of DNA. In particular, we analyzed 22 inversions predicted in humans ranging from 5.1 kb to 226 kb and mediated by inverted repeat sequences of 1.6-24 kb. First, we validated 17 of the 22 inversions in a panel of nine HapMap individuals from different populations, and we genotyped them in 68 additional individuals of European origin, with correct genetic transmission in ∼ 12 mother-father-child trios. Global inversion minor allele frequency varied between 1% and 49% and inversion genotypes were consistent with Hardy-Weinberg equilibrium. By analyzing the nucleotide variation and the haplotypes in these regions, we found that only four inversions have linked tag-SNPs and that in many cases there are multiple shared SNPs between standard and inverted chromosomes, suggesting an unexpected high degree of inversion recurrence during human evolution. iPCR was also used to check 16 of these inversions in four chimpanzees and two gorillas, and 10 showed both orientations either within or between species, providing additional support for their multiple origin. Finally, we have identified several inversions that include genes in the inverted or breakpoint regions, and at least one disrupts a potential coding gene. Thus, these results represent a significant advance in our understanding of inversion polymorphism in human populations and challenge the common view of a single origin of inversions, with important implications for inversion analysis in SNP-based studies.
Collapse
Affiliation(s)
- Cristina Aguado
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Magdalena Gayà-Vidal
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Sergi Villatoro
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Meritxell Oliva
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - David Izquierdo
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Carla Giner-Delgado
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Víctor Montalvo
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Judit García-González
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | | | - Laia Capilla
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Aurora Ruiz-Herrera
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
- Departament de Biologia Celular, Fisiologia i Immunologia. Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Xavier Estivill
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Marta Puig
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
22
|
Martínez-Fundichely A, Casillas S, Egea R, Ràmia M, Barbadilla A, Pantano L, Puig M, Cáceres M. InvFEST, a database integrating information of polymorphic inversions in the human genome. Nucleic Acids Res 2014; 42:D1027-32. [PMID: 24253300 PMCID: PMC3965118 DOI: 10.1093/nar/gkt1122] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Revised: 10/18/2013] [Accepted: 10/24/2013] [Indexed: 12/27/2022] Open
Abstract
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.
Collapse
Affiliation(s)
- Alexander Martínez-Fundichely
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Sònia Casillas
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Raquel Egea
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Miquel Ràmia
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Antonio Barbadilla
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Lorena Pantano
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Marta Puig
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
23
|
Genetic predisposition in anaesthesia and critical care, science fiction or reality? TRENDS IN ANAESTHESIA AND CRITICAL CARE 2013. [DOI: 10.1016/j.tacc.2013.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
24
|
Abstract
Chromothripsis scars the genome when localized chromosome shattering and repair occurs in a one-off catastrophe. Outcomes of this process are detectable as massive DNA rearrangements affecting one or a few chromosomes. Although recent findings suggest a crucial role of chromothripsis in cancer development, the reproducible inference of this process remains challenging, requiring that cataclysmic one-off rearrangements be distinguished from localized lesions that occur progressively. We describe conceptual criteria for the inference of chromothripsis, based on ruling out the alternative hypothesis that stepwise rearrangements occurred. Robust means of inference may facilitate in-depth studies on the impact of, and the mechanisms underlying, chromothripsis.
Collapse
|
25
|
Lucas Lledó JI, Cáceres M. On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing. PLoS One 2013; 8:e61292. [PMID: 23637806 PMCID: PMC3634047 DOI: 10.1371/journal.pone.0061292] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 03/07/2013] [Indexed: 12/15/2022] Open
Abstract
One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, ≥% of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions--SVDetect, GRIAL, and VariationHunter--, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects.
Collapse
Affiliation(s)
- José Ignacio Lucas Lledó
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
| | - Mario Cáceres
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| |
Collapse
|
26
|
|
27
|
Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet 2013; 14:125-38. [PMID: 23329113 DOI: 10.1038/nrg3373] [Citation(s) in RCA: 419] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genomic structural variants have long been implicated in phenotypic diversity and human disease, but dissecting the mechanisms by which they exert their functional impact has proven elusive. Recently however, developments in high-throughput DNA sequencing and chromosomal engineering technology have facilitated the analysis of structural variants in human populations and model systems in unprecedented detail. In this Review, we describe how structural variants can affect molecular and cellular processes, leading to complex organismal phenotypes, including human disease. We further present advances in delineating disease-causing elements that are affected by structural variants, and we discuss future directions for research on the functional consequences of structural variants.
Collapse
Affiliation(s)
- Joachim Weischenfeldt
- Genome Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, 69117, Germany
| | | | | | | |
Collapse
|
28
|
Lundin S, Gruselius J, Nystedt B, Lexow P, Käller M, Lundeberg J. Hierarchical molecular tagging to resolve long continuous sequences by massively parallel sequencing. Sci Rep 2013; 3:1186. [PMID: 23470464 PMCID: PMC3592332 DOI: 10.1038/srep01186] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Accepted: 01/09/2013] [Indexed: 01/20/2023] Open
Abstract
Here we demonstrate the use of short-read massive sequencing systems to in effect achieve longer read lengths through hierarchical molecular tagging. We show how indexed and PCR-amplified targeted libraries are degraded, sub-sampled and arrested at timed intervals to achieve pools of differing average length, each of which is indexed with a new tag. By this process, indices of sample origin, molecular origin, and degree of degradation is incorporated in order to achieve a nested hierarchical structure, later to be utilized in the data processing to order the reads over a longer distance than the sequencing system originally allows. With this protocol we show how continuous regions beyond 3000 bp can be decoded by an Illumina sequencing system, and we illustrate the potential applications by calling variants of the lambda genome, analysing TP53 in cancer cell lines, and targeting a variable canine mitochondrial region.
Collapse
Affiliation(s)
- Sverker Lundin
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Joel Gruselius
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Björn Nystedt
- Science for Life Laboratory, Stockholm University, Department of Biochemistry and Biophysics, Stockholm, 106 91, Sweden
| | | | - Max Käller
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| | - Joakim Lundeberg
- Science for Life Laboratory, KTH, Gene Technology, Solna, 171 65, Sweden
| |
Collapse
|
29
|
Cardoso-Moreira M, Arguello JR, Clark AG. Mutation spectrum of Drosophila CNVs revealed by breakpoint sequencing. Genome Biol 2012; 13:R119. [PMID: 23259534 PMCID: PMC4056370 DOI: 10.1186/gb-2012-13-12-r119] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 12/22/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detailed study of breakpoints associated with copy number variants (CNVs) can elucidate the mutational mechanisms that generate them and the comparison of breakpoints across species can highlight differences in genomic architecture that may lead to lineage-specific differences in patterns of CNVs. Here, we provide a detailed analysis of Drosophila CNV breakpoints and contrast it with similar analyses recently carried out for the human genome. RESULTS By applying split-read methods to a total of 10x coverage of 454 shotgun sequence across nine lines of D. melanogaster and by re-examining a previously published dataset of CNVs detected using tiling arrays, we identified the precise breakpoints of more than 600 insertions, deletions, and duplications. Contrasting these CNVs with those found in humans showed that in both taxa CNV breakpoints fall into three classes: blunt breakpoints; simple breakpoints associated with microhomology; and breakpoints with additional nucleotides inserted/deleted and no microhomology. In both taxa CNV breakpoints are enriched with non-B DNA sequence structures, which may impair DNA replication and/or repair. However, in contrast to human genomes, non-allelic homologous-recombination (NAHR) plays a negligible role in CNV formation in Drosophila. In flies, non-homologous repair mechanisms are responsible for simple, recurrent, and complex CNVs, including insertions of de novo sequence as large as 60 bp. CONCLUSIONS Humans and Drosophila differ considerably in the importance of homology-based mechanisms for the formation of CNVs, likely as a consequence of the differences in the abundance and distribution of both segmental duplications and transposable elements between the two genomes.
Collapse
|
30
|
Zichner T, Garfield DA, Rausch T, Stütz AM, Cannavó E, Braun M, Furlong EEM, Korbel JO. Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing. Genome Res 2012; 23:568-79. [PMID: 23222910 PMCID: PMC3589545 DOI: 10.1101/gr.142646.112] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the “Drosophila melanogaster Genetic Reference Panel,” DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism.
Collapse
Affiliation(s)
- Thomas Zichner
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Kunz M, Dannemann M, Kelso J. High-throughput sequencing of the melanoma genome. Exp Dermatol 2012; 22:10-7. [PMID: 23174022 DOI: 10.1111/exd.12054] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2012] [Indexed: 12/16/2022]
Abstract
Next-generation sequencing technologies are now common for whole-genome, whole-exome and whole-transcriptome sequencing (RNA-seq) of tumors to identify point mutations, structural or copy number alterations and changes in gene expression. A substantial number of studies have already been performed for melanoma. One study analysed eight melanoma cell lines with RNA-Seq technology and identified 11 novel melanoma gene fusions. Whole-exome sequencing of seven melanoma cell lines identified overlapping gain of function mutations in MAP2K1 (MEK1) and MAP2K2 (MEK2) genes. Integrative sequencing of cutaneous melanoma metastases using different sequencing platforms revealed a new somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C (a CDK4 inhibitor). These latter sequencing-based discoveries may be used to motivate the inclusion of the affected patients into clinical trials with specific signalling pathway inhibitors. Taken together, we are at the beginning of an era with new sequencing technologies providing a more comprehensive view of cancer mutational landscapes and hereby a better understanding of their pathogenesis. This will also open interesting perspectives for new treatment approaches and clinical trial designs.
Collapse
Affiliation(s)
- Manfred Kunz
- Department of Dermatology, Venereology and Allergology, University of Leipzig, Leipzig, Germany.
| | | | | |
Collapse
|
32
|
A streamlined method for detecting structural variants in cancer genomes by short read paired-end sequencing. PLoS One 2012; 7:e48314. [PMID: 23144753 PMCID: PMC3483208 DOI: 10.1371/journal.pone.0048314] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 09/24/2012] [Indexed: 01/21/2023] Open
Abstract
Defining the architecture of a specific cancer genome, including its structural variants, is essential for understanding tumor biology, mechanisms of oncogenesis, and for designing effective personalized therapies. Short read paired-end sequencing is currently the most sensitive method for detecting somatic mutations that arise during tumor development. However, mapping structural variants using this method leads to a large number of false positive calls, mostly due to the repetitive nature of the genome and the difficulty of assigning correct mapping positions to short reads. This study describes a method to efficiently identify large tumor-specific deletions, inversions, duplications and translocations from low coverage data using SVDetect or BreakDancer software and a set of novel filtering procedures designed to reduce false positive calls. Applying our method to a spontaneous T cell lymphoma arising in a core RAG2/p53-deficient mouse, we identified 40 validated tumor-specific structural rearrangements supported by as few as 2 independent read pairs.
Collapse
|
33
|
|