1
|
Zhu X, Wolfgruber TK, Leong L, Jensen M, Scott C, Winham S, Sadowski P, Vachon C, Kerlikowske K, Shepherd JA. Deep Learning Predicts Interval and Screening-detected Cancer from Screening Mammograms: A Case-Case-Control Study in 6369 Women. Radiology 2021; 301:550-558. [PMID: 34491131 PMCID: PMC8630596 DOI: 10.1148/radiol.2021203758] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Background The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screening-detected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BI-RADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The P values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Bae and Kim in this issue.
Collapse
|
2
|
Benny PA, Al-Akwaa FM, Dirkx C, Schlueter RJ, Wolfgruber TK, Chern IY, Hoops S, Knights D, Garmire LX. Placentas delivered by pre-pregnant obese women have reduced abundance and diversity in the microbiome. FASEB J 2021; 35:e21524. [PMID: 33742690 PMCID: PMC8251846 DOI: 10.1096/fj.202002184rr] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 02/26/2021] [Accepted: 02/26/2021] [Indexed: 12/12/2022]
Abstract
Maternal pre‐pregnancy obesity may have an impact on both maternal and fetal health. We examined the microbiome recovered from placentas in a multi‐ethnic maternal pre‐pregnant obesity cohort, through an optimized microbiome protocol to enrich low bacterial biomass samples. We found that the microbiomes recovered from the placentas of obese pre‐pregnant mothers are less abundant and less diverse when compared to those from mothers of normal pre‐pregnancy weight. Microbiome richness also decreases from the maternal side to the fetal side, demonstrating heterogeneity by geolocation within the placenta. In summary, our study shows that the microbiomes recovered from the placentas are associated with pre‐pregnancy obesity. Importance Maternal pre‐pregnancy obesity may have an impact on both maternal and fetal health. The placenta is an important organ at the interface of the mother and fetus, and supplies nutrients to the fetus. We report that the microbiomes enriched from the placentas of obese pre‐pregnant mothers are less abundant and less diverse when compared to those from mothers of normal pre‐pregnancy weight. More over, the microbiomes also vary by geolocation within the placenta.
Collapse
Affiliation(s)
- Paula A Benny
- Department of Epidemiology, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Fadhl M Al-Akwaa
- Department of Computational Medicine and Bioinformatics, North Campus Research Complex, University of Michigan, Ann Arbor, MI, USA
| | - Corbin Dirkx
- University of Minnesota Genomics Center, University of Minnesota- Twin Cities, Minneapolis, MN, USA
| | - Ryan J Schlueter
- Department of Obstetrics and Gynaecology, University of Hawaii, Honolulu, HI, USA
| | - Thomas K Wolfgruber
- Department of Epidemiology, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Ingrid Y Chern
- Department of Obstetrics and Gynaecology, University of Hawaii, Honolulu, HI, USA
| | - Suzie Hoops
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, USA.,Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Dan Knights
- BioTechnology Institute, College of Biological Sciences, University of Minnesota, Minneapolis, MN, USA.,Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, North Campus Research Complex, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
3
|
Helmkampf M, Wolfgruber TK, Bellinger MR, Paudel R, Kantar MB, Miyasaka SC, Kimball HL, Brown A, Veillet A, Read A, Shintaku M. Phylogenetic Relationships, Breeding Implications, and Cultivation History of Hawaiian Taro (Colocasia Esculenta) Through Genome-Wide SNP Genotyping. J Hered 2019; 109:272-282. [PMID: 28992295 PMCID: PMC6018804 DOI: 10.1093/jhered/esx070] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 08/11/2017] [Indexed: 11/22/2022] Open
Abstract
Taro, Colocasia esculenta, is one of the world’s oldest root crops and is of particular economic and cultural significance in Hawai’i, where historically more than 150 different landraces were grown. We developed a genome-wide set of more than 2400 high-quality single nucleotide polymorphism (SNP) markers from 70 taro accessions of Hawaiian, South Pacific, Palauan, and mainland Asian origins, with several objectives: 1) uncover the phylogenetic relationships between Hawaiian and other Pacific landraces, 2) shed light on the history of taro cultivation in Hawai’i, and 3) develop a tool to discriminate among Hawaiian and other taros. We found that almost all existing Hawaiian landraces fall into 5 monophyletic groups that are largely consistent with the traditional Hawaiian classification based on morphological characters, for example, leaf shape and petiole color. Genetic diversity was low within these clades but considerably higher between them. Population structure analyses further indicated that the diversification of taro in Hawai’i most likely occurred by a combination of frequent somatic mutation and occasional hybridization. Unexpectedly, the South Pacific accessions were found nested within the clades mainly composed of Hawaiian accessions, rather than paraphyletic to them. This suggests that the origin of clades identified here preceded the colonization of Hawai’i and that early Polynesian settlers brought taro landraces from different clades with them. In the absence of a sequenced genome, this marker set provides a valuable resource towards obtaining a genetic linkage map and to study the genetic basis of phenotypic traits of interest to taro breeding such as disease resistance.
Collapse
Affiliation(s)
- Martin Helmkampf
- Tropical Conservation Biology and Environmental Science, University of Hawai'i at Hilo, Hilo, HI
| | - Thomas K Wolfgruber
- Department of Tropical Plant and Soil Sciences, University of Hawai'i at Manoa, Honolulu, HI
| | - M Renee Bellinger
- Tropical Conservation Biology and Environmental Science, University of Hawai'i at Hilo, Hilo, HI.,Department of Biology, University of Hawai'i at Hilo, Hilo, HI
| | - Roshan Paudel
- Department of Tropical Plant and Soil Sciences, University of Hawai'i at Manoa, Honolulu, HI
| | - Michael B Kantar
- Department of Tropical Plant and Soil Sciences, University of Hawai'i at Manoa, Honolulu, HI
| | - Susan C Miyasaka
- Department of Tropical Plant and Soil Sciences, University of Hawai'i at Manoa, Honolulu, HI
| | - Heather L Kimball
- Tropical Conservation Biology and Environmental Science, University of Hawai'i at Hilo, Hilo, HI
| | | | - Anne Veillet
- Tropical Conservation Biology and Environmental Science, University of Hawai'i at Hilo, Hilo, HI
| | - Andrew Read
- Plant Pathology and Plant-Microbe Biology Section, Cornell University, Ithaca, NY
| | - Michael Shintaku
- College of Agriculture, Forestry & Natural Resource Management, University of Hawai'i at Hilo, Hilo, HI
| |
Collapse
|
4
|
Ortega MA, Poirion O, Zhu X, Huang S, Wolfgruber TK, Sebra R, Garmire LX. Using single-cell multiple omics approaches to resolve tumor heterogeneity. Clin Transl Med 2017; 6:46. [PMID: 29285690 PMCID: PMC5746494 DOI: 10.1186/s40169-017-0177-y] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Accepted: 12/06/2017] [Indexed: 12/31/2022] Open
Abstract
It has become increasingly clear that both normal and cancer tissues are composed of heterogeneous populations. Genetic variation can be attributed to the downstream effects of inherited mutations, environmental factors, or inaccurately resolved errors in transcription and replication. When lesions occur in regions that confer a proliferative advantage, it can support clonal expansion, subclonal variation, and neoplastic progression. In this manner, the complex heterogeneous microenvironment of a tumour promotes the likelihood of angiogenesis and metastasis. Recent advances in next-generation sequencing and computational biology have utilized single-cell applications to build deep profiles of individual cells that are otherwise masked in bulk profiling. In addition, the development of new techniques for combining single-cell multi-omic strategies is providing a more precise understanding of factors contributing to cellular identity, function, and growth. Continuing advancements in single-cell technology and computational deconvolution of data will be critical for reconstructing patient specific intra-tumour features and developing more personalized cancer treatments.
Collapse
Affiliation(s)
- Michael A. Ortega
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
| | - Olivier Poirion
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
| | - Xun Zhu
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
- Department of Molecular Biosciences and Bioengineering, Honolulu, HI USA
| | - Sijia Huang
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
- Department of Molecular Biosciences and Bioengineering, Honolulu, HI USA
| | - Thomas K. Wolfgruber
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
| | - Robert Sebra
- Icahn Institute and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Lana X. Garmire
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI USA
- Department of Molecular Biosciences and Bioengineering, Honolulu, HI USA
| |
Collapse
|
5
|
Zhu X, Wolfgruber TK, Tasato A, Arisdakessian C, Garmire DG, Garmire LX. Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists. Genome Med 2017; 9:108. [PMID: 29202807 PMCID: PMC5716224 DOI: 10.1186/s13073-017-0492-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 11/07/2017] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. RESULTS We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. CONCLUSIONS Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.
Collapse
Affiliation(s)
- Xun Zhu
- Graduate Program in Molecular Biology and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, 96813, USA
| | - Thomas K Wolfgruber
- Graduate Program in Molecular Biology and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, 96813, USA
| | - Austin Tasato
- Department of Electrical Engineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
| | - Cédric Arisdakessian
- Graduate Program in Molecular Biology and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, 96813, USA
| | - David G Garmire
- Department of Electrical Engineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA
| | - Lana X Garmire
- Graduate Program in Molecular Biology and Bioengineering, University of Hawaii at Manoa, Honolulu, HI, 96816, USA.
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, 96813, USA.
| |
Collapse
|
6
|
Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, Campbell MS, Stein JC, Wei X, Chin CS, Guill K, Regulski M, Kumari S, Olson A, Gent J, Schneider KL, Wolfgruber TK, May MR, Springer NM, Antoniou E, McCombie WR, Presting GG, McMullen M, Ross-Ibarra J, Dawe RK, Hastie A, Rank DR, Ware D. Improved maize reference genome with single-molecule technologies. Nature 2017; 546:524-527. [PMID: 28605751 DOI: 10.1101/079004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/14/2017] [Indexed: 05/21/2023]
Abstract
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Collapse
Affiliation(s)
- Yinping Jiao
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Paul Peluso
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Jinghua Shi
- BioNano Genomics, San Diego, California 92121, USA
| | | | - Michelle C Stitzer
- Department of Plant Sciences and Center for Population Biology, University of California, Davis, Davis, California 95616, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - Joshua C Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Xuehong Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - Katherine Guill
- USDA-ARS, Plant Genetics Research Unit, Columbia, Missouri 65211, USA
| | - Michael Regulski
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - Kevin L Schneider
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, Hawaii 96822, USA
| | - Thomas K Wolfgruber
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, Hawaii 96822, USA
| | - Michael R May
- Department of Evolution and Ecology, University of California, Davis, California 95616, USA
| | - Nathan M Springer
- Department of Plant Biology, University of Minnesota, St Paul, Minnesota 55108, USA
| | - Eric Antoniou
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | - Gernot G Presting
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, Hawaii 96822, USA
| | - Michael McMullen
- USDA-ARS, Plant Genetics Research Unit, Columbia, Missouri 65211, USA
| | - Jeffrey Ross-Ibarra
- Department of Plant Sciences, Center for Population Biology, and Genome Center, University of California, Davis, California 95616, USA
| | - R Kelly Dawe
- University of Georgia, Athens, Georgia 30602, USA
| | - Alex Hastie
- BioNano Genomics, San Diego, California 92121, USA
| | - David R Rank
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
- USDA-ARS, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
7
|
Schneider KL, Xie Z, Wolfgruber TK, Presting GG. Inbreeding drives maize centromere evolution. Proc Natl Acad Sci U S A 2016. [PMID: 26858403 DOI: 10.1073/pnas.1522008113113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
Functional centromeres, the chromosomal sites of spindle attachment during cell division, are marked epigenetically by the centromere-specific histone H3 variant cenH3 and typically contain long stretches of centromere-specific tandem DNA repeats (∼1.8 Mb in maize). In 23 inbreds of domesticated maize chosen to represent the genetic diversity of maize germplasm, partial or nearly complete loss of the tandem DNA repeat CentC precedes 57 independent cenH3 relocation events that result in neocentromere formation. Chromosomal regions with newly acquired cenH3 are colonized by the centromere-specific retrotransposon CR2 at a rate that would result in centromere-sized CR2 clusters in 20,000-95,000 y. Three lines of evidence indicate that CentC loss is linked to inbreeding, including (i) CEN10 of temperate lineages, presumed to have experienced a genetic bottleneck, contain less CentC than their tropical relatives; (ii) strong selection for centromere-linked genes in domesticated maize reduced diversity at seven of the ten maize centromeres to only one or two postdomestication haplotypes; and (iii) the centromere with the largest number of haplotypes in domesticated maize (CEN7) has the highest CentC levels in nearly all domesticated lines. Rare recombinations introduced one (CEN2) or more (CEN5) alternate CEN haplotypes while retaining a single haplotype at domestication loci linked to these centromeres. Taken together, this evidence strongly suggests that inbreeding, favored by postdomestication selection for centromere-linked genes affecting key domestication or agricultural traits, drives replacement of the tandem centromere repeats in maize and other crop plants. Similar forces may act during speciation in natural systems.
Collapse
Affiliation(s)
- Kevin L Schneider
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, HI 96822
| | - Zidian Xie
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, HI 96822
| | - Thomas K Wolfgruber
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, HI 96822
| | - Gernot G Presting
- Department of Molecular Biosciences and Bioengineering, University of Hawaii, Honolulu, HI 96822
| |
Collapse
|
8
|
Wolfgruber TK, Nakashima MM, Schneider KL, Sharma A, Xie Z, Albert PS, Xu R, Bilinski P, Dawe RK, Ross-Ibarra J, Birchler JA, Presting GG. High Quality Maize Centromere 10 Sequence Reveals Evidence of Frequent Recombination Events. Front Plant Sci 2016; 7:308. [PMID: 27047500 PMCID: PMC4806543 DOI: 10.3389/fpls.2016.00308] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2015] [Accepted: 02/27/2016] [Indexed: 05/02/2023]
Abstract
The ancestral centromeres of maize contain long stretches of the tandemly arranged CentC repeat. The abundance of tandem DNA repeats and centromeric retrotransposons (CR) has presented a significant challenge to completely assembling centromeres using traditional sequencing methods. Here, we report a nearly complete assembly of the 1.85 Mb maize centromere 10 from inbred B73 using PacBio technology and BACs from the reference genome project. The error rates estimated from overlapping BAC sequences are 7 × 10(-6) and 5 × 10(-5) for mismatches and indels, respectively. The number of gaps in the region covered by the reassembly was reduced from 140 in the reference genome to three. Three expressed genes are located between 92 and 477 kb from the inferred ancestral CentC cluster, which lies within the region of highest centromeric repeat density. The improved assembly increased the count of full-length CR from 5 to 55 and revealed a 22.7 kb segmental duplication that occurred approximately 121,000 years ago. Our analysis provides evidence of frequent recombination events in the form of partial retrotransposons, deletions within retrotransposons, chimeric retrotransposons, segmental duplications including higher order CentC repeats, a deleted CentC monomer, centromere-proximal inversions, and insertion of mitochondrial sequences. Double-strand DNA break (DSB) repair is the most plausible mechanism for these events and may be the major driver of centromere repeat evolution and diversity. In many cases examined here, DSB repair appears to be mediated by microhomology, suggesting that tandem repeats may have evolved to efficiently repair frequent DSBs in centromeres.
Collapse
Affiliation(s)
- Thomas K. Wolfgruber
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
| | - Megan M. Nakashima
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
| | - Kevin L. Schneider
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
| | - Anupma Sharma
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
| | - Zidian Xie
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
| | - Patrice S. Albert
- Division of Biological Sciences, University of MissouriColumbia, MO, USA
| | - Ronghui Xu
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
| | - Paul Bilinski
- Department of Plant Sciences, University of California DavisDavis, CA, USA
| | - R. Kelly Dawe
- Department of Plant Biology, University of GeorgiaAthens, GA, USA
| | | | - James A. Birchler
- Division of Biological Sciences, University of MissouriColumbia, MO, USA
| | - Gernot G. Presting
- Department of Molecular Biosciences and Bioengineering, University of Hawaíi at MānoaHonolulu, HI, USA
- *Correspondence: Gernot G. Presting
| |
Collapse
|
9
|
Sharma A, Wolfgruber TK, Presting GG. Tandem repeats derived from centromeric retrotransposons. BMC Genomics 2013; 14:142. [PMID: 23452340 PMCID: PMC3648361 DOI: 10.1186/1471-2164-14-142] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Accepted: 02/23/2013] [Indexed: 12/26/2022] Open
Abstract
Background Tandem repeats are ubiquitous and abundant in higher eukaryotic genomes and constitute, along with transposable elements, much of DNA underlying centromeres and other heterochromatic domains. In maize, centromeric satellite repeat (CentC) and centromeric retrotransposons (CR), a class of Ty3/gypsy retrotransposons, are enriched at centromeres. Some satellite repeats have homology to retrotransposons and several mechanisms have been proposed to explain the expansion, contraction as well as homogenization of tandem repeats. However, the origin and evolution of tandem repeat loci remain largely unknown. Results CRM1TR and CRM4TR are novel tandem repeats that we show to be entirely derived from CR elements belonging to two different subfamilies, CRM1 and CRM4. Although these tandem repeats clearly originated in at least two separate events, they are derived from similar regions of their respective parent element, namely the long terminal repeat (LTR) and untranslated region (UTR). The 5′ ends of the monomer repeat units of CRM1TR and CRM4TR map to different locations within their respective LTRs, while their 3′ ends map to the same relative position within a conserved region of their UTRs. Based on the insertion times of heterologous retrotransposons that have inserted into these tandem repeats, amplification of the repeats is estimated to have begun at least ~4 (CRM1TR) and ~1 (CRM4TR) million years ago. Distinct CRM1TR sequence variants occupy the two CRM1TR loci, indicating that there is little or no movement of repeats between loci, even though they are separated by only ~1.4 Mb. Conclusions The discovery of two novel retrotransposon derived tandem repeats supports the conclusions from earlier studies that retrotransposons can give rise to tandem repeats in eukaryotic genomes. Analysis of monomers from two different CRM1TR loci shows that gene conversion is the major cause of sequence variation. We propose that successive intrastrand deletions generated the initial repeat structure, and gene conversions increased the size of each tandem repeat locus.
Collapse
|
10
|
Sherwood AR, Wang N, Carlile AL, Neumann JM, Wolfgruber TK, Presting GG. The Hawaiian Freshwater Algal Database (HfwADB): a laboratory LIMS and online biodiversity resource. BMC Ecol 2012; 12:22. [PMID: 23095476 PMCID: PMC3526539 DOI: 10.1186/1472-6785-12-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 10/23/2012] [Indexed: 11/10/2022] Open
Abstract
Background Biodiversity databases serve the important role of highlighting species-level diversity from defined geographical regions. Databases that are specially designed to accommodate the types of data gathered during regional surveys are valuable in allowing full data access and display to researchers not directly involved with the project, while serving as a Laboratory Information Management System (LIMS). The Hawaiian Freshwater Algal Database, or HfwADB, was modified from the Hawaiian Algal Database to showcase non-marine algal specimens collected from the Hawaiian Archipelago by accommodating the additional level of organization required for samples including multiple species. Description The Hawaiian Freshwater Algal Database is a comprehensive and searchable database containing photographs and micrographs of samples and collection sites, geo-referenced collecting information, taxonomic data and standardized DNA sequence data. All data for individual samples are linked through unique 10-digit accession numbers (“Isolate Accession”), the first five of which correspond to the collection site (“Environmental Accession”). Users can search online for sample information by accession number, various levels of taxonomy, habitat or collection site. HfwADB is hosted at the University of Hawaii, and was made publicly accessible in October 2011. At the present time the database houses data for over 2,825 samples of non-marine algae from 1,786 collection sites from the Hawaiian Archipelago. These samples include cyanobacteria, red and green algae and diatoms, as well as lesser representation from some other algal lineages. Conclusions HfwADB is a digital repository that acts as a Laboratory Information Management System for Hawaiian non-marine algal data. Users can interact with the repository through the web to view relevant habitat data (including geo-referenced collection locations) and download images of collection sites, specimen photographs and micrographs, and DNA sequences. It is publicly available at http://algae.manoa.hawaii.edu/hfwadb/.
Collapse
Affiliation(s)
- Alison R Sherwood
- Department of Botany, University of Hawaii at Manoa, 3190 Maile Way, Honolulu, Hawaii 96822, USA.
| | | | | | | | | | | |
Collapse
|
11
|
Wolfgruber TK, Presting GG. JunctionViewer: customizable annotation software for repeat-rich genomic regions. BMC Bioinformatics 2010; 11:23. [PMID: 20067643 PMCID: PMC2824676 DOI: 10.1186/1471-2105-11-23] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 01/12/2010] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Repeat-rich regions such as centromeres receive less attention than their gene-rich euchromatic counterparts because the former are difficult to assemble and analyze. Our objectives were to 1) map all ten centromeres onto the maize genetic map and 2) characterize the sequence features of maize centromeres, each of which spans several megabases of highly repetitive DNA. Repetitive sequences can be mapped using special molecular markers that are based on PCR with primers designed from two unique "repeat junctions". Efficient screening of large amounts of maize genome sequence data for repeat junctions, as well as key centromere sequence features required the development of specific annotation software. RESULTS We developed JunctionViewer to automate the process of identifying and differentiating closely related centromere repeats and repeat junctions, and to generate graphical displays of these and other features within centromeric sequences. JunctionViewer generates NCBI BLAST, WU-BLAST, cross_match and MUMmer alignments, and displays the optimal alignments and additional annotation data as concise graphical representations that can be viewed directly through the graphical interface or as PostScript output.This software enabled us to quickly characterize millions of nucleotides of newly sequenced DNA ranging in size from single reads to assembled BACs and megabase-sized pseudochromosome regions. It expedited the process of generating repeat junction markers that were subsequently used to anchor all 10 centromeres to the maize map. It also enabled us to efficiently identify key features in large genomic regions, providing insight into the arrangement and evolution of maize centromeric DNA. CONCLUSIONS JunctionViewer will be useful to scientists who wish to automatically generate concise graphical summaries of repeat sequences. It is particularly valuable for those needing to efficiently identify unique repeat junctions. The scalability and ability to customize homology search parameters for different classes of closely related repeat sequences make this software ideal for recurring annotation (e.g., genome projects that are in progress) of genomic regions that contain well-defined repeats, such as those in centromeres. Although originally customized for maize centromere sequence, we anticipate this software to facilitate the analysis of centromere and other repeat-rich regions in other organisms.
Collapse
Affiliation(s)
- Thomas K Wolfgruber
- Department of Molecular Biosciences and Bioengineering, University of Hawai'i at Mânoa, Honolulu, HI 96822, USA
| | | |
Collapse
|
12
|
Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, Chen W, Yan L, Higginbotham J, Cardenas M, Waligorski J, Applebaum E, Phelps L, Falcone J, Kanchi K, Thane T, Scimone A, Thane N, Henke J, Wang T, Ruppert J, Shah N, Rotter K, Hodges J, Ingenthron E, Cordes M, Kohlberg S, Sgro J, Delgado B, Mead K, Chinwalla A, Leonard S, Crouse K, Collura K, Kudrna D, Currie J, He R, Angelova A, Rajasekar S, Mueller T, Lomeli R, Scara G, Ko A, Delaney K, Wissotski M, Lopez G, Campos D, Braidotti M, Ashley E, Golser W, Kim H, Lee S, Lin J, Dujmic Z, Kim W, Talag J, Zuccolo A, Fan C, Sebastian A, Kramer M, Spiegel L, Nascimento L, Zutavern T, Miller B, Ambroise C, Muller S, Spooner W, Narechania A, Ren L, Wei S, Kumari S, Faga B, Levy MJ, McMahan L, Van Buren P, Vaughn MW, Ying K, Yeh CT, Emrich SJ, Jia Y, Kalyanaraman A, Hsia AP, Barbazuk WB, Baucom RS, Brutnell TP, Carpita NC, Chaparro C, Chia JM, Deragon JM, Estill JC, Fu Y, Jeddeloh JA, Han Y, Lee H, Li P, Lisch DR, Liu S, Liu Z, Nagel DH, McCann MC, SanMiguel P, Myers AM, Nettleton D, Nguyen J, Penning BW, Ponnala L, Schneider KL, Schwartz DC, Sharma A, Soderlund C, Springer NM, Sun Q, Wang H, Waterman M, Westerman R, Wolfgruber TK, Yang L, Yu Y, Zhang L, Zhou S, Zhu Q, Bennetzen JL, Dawe RK, Jiang J, Jiang N, Presting GG, Wessler SR, Aluru S, Martienssen RA, Clifton SW, McCombie WR, Wing RA, Wilson RK. The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science 2009; 326:1112-5. [PMID: 19965430 DOI: 10.1126/science.1178534] [Citation(s) in RCA: 2467] [Impact Index Per Article: 164.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
13
|
Luce AC, Sharma A, Mollere OSB, Wolfgruber TK, Nagaki K, Jiang J, Presting GG, Dawe RK. Precise centromere mapping using a combination of repeat junction markers and chromatin immunoprecipitation-polymerase chain reaction. Genetics 2006; 174:1057-61. [PMID: 16951073 PMCID: PMC1602074 DOI: 10.1534/genetics.106.060467] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Centromeres are difficult to map even in species where genetic resolution is excellent. Here we show that junctions between repeats provide reliable single-copy markers for recombinant inbred mapping within centromeres and pericentromeric heterochromatin. Repeat junction mapping was combined with anti-CENH3-mediated ChIP to provide a definitive map position for maize centromere 8.
Collapse
Affiliation(s)
- Amy C Luce
- Department of Plant Biology, University of Georgia, Georgia 30602, USA
| | | | | | | | | | | | | | | |
Collapse
|