1
|
Wagner J, Olson ND, Harris L, Khan Z, Farek J, Mahmoud M, Stankovic A, Kovacevic V, Yoo B, Miller N, Rosenfeld JA, Ni B, Zarate S, Kirsche M, Aganezov S, Schatz MC, Narzisi G, Byrska-Bishop M, Clarke W, Evani US, Markello C, Shafin K, Zhou X, Sidow A, Bansal V, Ebert P, Marschall T, Lansdorp P, Hanlon V, Mattsson CA, Barrio AM, Fiddes IT, Xiao C, Fungtammasan A, Chin CS, Wenger AM, Rowell WJ, Sedlazeck FJ, Carroll A, Salit M, Zook JM. Benchmarking challenging small variants with linked and long reads. Cell Genom 2022; 2:100128. [PMID: 36452119 PMCID: PMC9706577 DOI: 10.1016/j.xgen.2022.100128] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
Collapse
Affiliation(s)
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
- Corresponding author
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Ziad Khan
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Ana Stankovic
- Seven Bridges, Omladinskih brigada 90g, 11070 Belgrade, Republic of Serbia
| | - Vladimir Kovacevic
- Seven Bridges, Omladinskih brigada 90g, 11070 Belgrade, Republic of Serbia
| | - Byunggil Yoo
- Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Neil Miller
- Children’s Mercy Kansas City, Kansas City, MO, USA
| | | | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Giuseppe Narzisi
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | | | - Wayne Clarke
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | - Uday S. Evani
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | - Charles Markello
- University of California at Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA, USA
| | - Kishwar Shafin
- University of California at Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Vikas Bansal
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Peter Ebert
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Tobias Marschall
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Peter Lansdorp
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Vincent Hanlon
- Terry Fox Laboratory, BC Cancer Research Institute and Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Carl-Adam Mattsson
- Terry Fox Laboratory, BC Cancer Research Institute and Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Andrew Carroll
- Google Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94040, USA
| | - Marc Salit
- Joint Initiative for Metrology in Biology, SLAC National Laboratory, Stanford, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
- Corresponding author
| |
Collapse
|
2
|
Schattgen SA, Guion K, Crawford JC, Souquette A, Barrio AM, Stubbington M, Thomas PG, Bradley P. Linking immune receptor sequence to transcriptional states with clonotype neighbor graph analysis (CoNGA). The Journal of Immunology 2021. [DOI: 10.4049/jimmunol.206.supp.26.07] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
Abstract
Abstract
Recent advances now allow for deep simultaneous profiling of T cell clonotypes, defined by T cell receptor (TCR) sequence, and phenotype, as reflected in gene expression (GEX) profile, surface protein expression, and epitope binding at the single-cell-level. However, there currently few tools available for unsupervised discovery of relationships between TCR sequence and cell phenotype. We hypothesized that by identifying correlations between “TCR neighborhoods”, defined by shared TCR sequence and GEX features, we could move beyond simply measuring GEX variation within clonal descendants and identify novel associations between T cell specificities and states. Previously, we introduced TCRdist, a measure for assessing inter-TCR similarity capable of identifying closely related clonotypes based on shared sequence features. Using TCRdist to quantify TCR similarity, we developed a graph-theoretic approach—clonotype neighbor-graph analysis or “CoNGA”—that identifies correlations between GEX profile and TCR sequence in an unbiased and automated manner through statistical analysis of GEX and TCR similarity graphs. Applying CoNGA, we uncovered novel associations between TCR and GEX space including a previously undescribed “natural lymphocyte” population of human blood CD8+ T cells; an association between TRBV gene usage and EPHB6 expression; and TCR sequence determinants of differentiation in thymocytes. These examples demonstrate that CoNGA is able to effectively deconvolve complex relationships between TCR sequence and cellular state. Conceptually, CoNGA could be extended to other clonally-related populations (e.g. B cells, tumors), and can easily incorporate other measurable features (e.g. ATAC-Seq).
Collapse
Affiliation(s)
| | - Kate Guion
- 2Fred Hutchinson Cancer Research Center
- 3University of Southern California
| | | | - Aisha Souquette
- 1St Jude Children’s Research Hospital
- 4The University of Tennessee Health Science Center
| | | | | | | | - Philip Bradley
- 2Fred Hutchinson Cancer Research Center
- 6University of Washington
| |
Collapse
|
3
|
Chin CS, Wagner J, Zeng Q, Garrison E, Garg S, Fungtammasan A, Rautiainen M, Aganezov S, Kirsche M, Zarate S, Schatz MC, Xiao C, Rowell WJ, Markello C, Farek J, Sedlazeck FJ, Bansal V, Yoo B, Miller N, Zhou X, Carroll A, Barrio AM, Salit M, Marschall T, Dilthey AT, Zook JM. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat Commun 2020; 11:4794. [PMID: 32963235 PMCID: PMC7508831 DOI: 10.1038/s41467-020-18564-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 08/27/2020] [Indexed: 01/20/2023] Open
Abstract
Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.
Collapse
Affiliation(s)
- Chen-Shan Chin
- DNAnexus, Inc, 1975 W El Camino Real, Suite 204, Mountain View, CA, 94040, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD, 20899, USA
| | - Qiandong Zeng
- Laboratory Corporation of America Holdings, 3400 Computer Drive, Westborough, MA, 01581, USA
| | - Erik Garrison
- University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, 95064, USA
| | - Shilpa Garg
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123, Saarbrücken, Germany
- Saarland Graduate School for Computer Science, Saarland Informatics Campus E1.3, 66123, Saarbrücken, Germany
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, NY, 11724, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | | | - Charles Markello
- University of California, Santa Cruz, 1156 High St, Santa Cruz, CA, 95064, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Vikas Bansal
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Byunggil Yoo
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, 64108, USA
| | - Neil Miller
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, 64108, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, 94043, USA
| | | | - Marc Salit
- Joint Initiative for Metrology in Biology, Stanford, CA, 94305, USA
| | - Tobias Marschall
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD, 20899, USA.
| |
Collapse
|
4
|
Sayadi A, Martinez Barrio A, Immonen E, Dainat J, Berger D, Tellgren-Roth C, Nystedt B, Arnqvist G. The genomic footprint of sexual conflict. Nat Ecol Evol 2019; 3:1725-1730. [DOI: 10.1038/s41559-019-1041-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/15/2019] [Indexed: 12/28/2022]
Abstract
AbstractGenes with sex-biased expression show a number of unique properties and this has been seen as evidence for conflicting selection pressures in males and females, forming a genetic ‘tug-of-war’ between the sexes. However, we lack studies of taxa where an understanding of conflicting phenotypic selection in the sexes has been linked with studies of genomic signatures of sexual conflict. Here, we provide such a link. We used an insect where sexual conflict is unusually well understood, the seed beetle Callosobruchus maculatus, to test for molecular genetic signals of sexual conflict across genes with varying degrees of sex-bias in expression. We sequenced, assembled and annotated its genome and performed population resequencing of three divergent populations. Sex-biased genes showed increased levels of genetic diversity and bore a remarkably clear footprint of relaxed purifying selection. Yet, segregating genetic variation was also affected by balancing selection in weakly female-biased genes, while male-biased genes showed signs of overall purifying selection. Female-biased genes contributed disproportionally to shared polymorphism across populations, while male-biased genes, male seminal fluid protein genes and sex-linked genes did not. Genes showing genomic signatures consistent with sexual conflict generally matched life-history phenotypes known to experience sexually antagonistic selection in this species. Our results highlight metabolic and reproductive processes, confirming the key role of general life-history traits in sexual conflict.
Collapse
|
5
|
Boutet SC, Walter D, Stubbington MJT, Pfeiffer KA, Lee JY, Taylor SEB, Montesclaros L, Lau JK, Riordan DP, Barrio AM, Brix L, Jacobsen K, Yeung B, Zhao X, Mikkelsen TS. Scalable and comprehensive characterization of antigen-specific CD8 T cells using multi-omics single cell analysis. The Journal of Immunology 2019. [DOI: 10.4049/jimmunol.202.supp.131.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Abstract
Understanding the antigen binding specificities of lymphocytes is key to the development of effective therapeutics for cancers and infectious diseases. Recent technological advancements have enabled the integration of simultaneous cell-surface protein, transcriptome, immune repertoire and antigen specificity measurements at single cell resolution, providing comprehensive, scalable, high-throughput characterization of immune cells.
Using the 10x Genomics Single Cell Immune Profiling Solution with Feature Barcoding technology with 14 oligo-conjugated antibodies and 50 Immudex peptide-MHC I Dextramer reagents (pMHC) panels spanning different CMV, EBV, Influenza, HIV and Cancer antigens, we performed multi-omic characterization of ~100,000 CD8+ T cells from four MHC-matched donors. The multi-omic combination of gene expression, paired alpha/beta T cell receptor (TCR) repertoire, cell surface proteins and pMHC binding specificity allowed the identification of CD8+ T cell subpopulations with specificity for pMHCs within our panel. We observed multiple TCRs that bound the same pMHC and identified enriched amino acid motifs within TCR sequences that shared specificities. We compared the CDR3 amino acid sequences of the pMHC-specific TCR clonotypes with previously reported sequences with the same binding specificities to show that we could identify new and known CDR3 sequences. This analytical framework provides a systematic and scalable method for deciphering TCR–pMHC specificity combined with cellular phenotype identity which is critical for developing a better understanding of the adaptive immune response to cancer and infectious diseases and will be key in the development of successful immunotherapies.
Collapse
|
6
|
Sukovich DJ, Taylor SEB, Pfeiffer KA, Stubbington MJT, Lee JY, Sapida J, Roidan DP, Barrio AM, Walter D, Brix L, Jacobsen K, Yeung B, Zhao X, Mikkelsen TS. An advancement in single cell genomics allows for T cell population analysis at high resolution. The Journal of Immunology 2019. [DOI: 10.4049/jimmunol.202.supp.131.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Abstract
Progress in our understanding of immunology and cancer immunotherapies requires a comprehensive view of immune cell behavior and the interactions of these cells with their environment. Recent technological innovations have facilitated the combination of cell-surface protein, transcriptome, immune repertoire, and antigen specificity measurements from the same single cells, providing thorough and high-throughput lymphocyte characterization.
Using the 10x Genomics Single Cell Immune Profiling Solution with Feature Barcoding technology along with oligo-conjugated antibodies and peptide-MHC (pMHC) Dextramers®, we performed multi-omic characterization of PBMCs from cytomegalovirus (CMV) seronegative and seropositive patients. Next generation sequencing libraries were made following the 10x Genomics workflows, where transcriptome and immune repertoire libraries are generated alongside libraries from DNA barcodes conjugated to antibodies or pMHC.
Full length, paired TCRα/β sequences with specificity to known CMV antigens were identified in the seropositive donor, but not in the seronegative donor. Interestingly, a large Epstein Barr Virus (EBV) pMHC specific T cell expansion was identified in the CMV seronegative donor, suggesting an active EBV response. Moreover, the combination of transcriptomic and cell surface protein information resulted in an increase in resolution of cell type identification. This multi-omic workflow allowed the identification of enriched amino acid motifs within the TCR sequences that contained novel and known CDR3 amino acid sequences specific to CMV.
These technological advancements provide new biological insights that are critical for progress in the field.
Collapse
|
7
|
Montesclaros L, Boutet SC, Taylor SEB, Stubbington MJT, Giangarra V, Lau JK, Sapida J, Ziraldo S, Pfeiffer KA, Zheng G, Barrio AM, Lee JY, Marrs S, Wu K, Mikkelsen TS. Deep characterization of tumor microenvironments using single cell multi-omics analysis. The Journal of Immunology 2019. [DOI: 10.4049/jimmunol.202.supp.194.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Abstract
Understanding the complexity of cell interactions in the tumor microenvironment (TME) requires the ability to distinguish each cell type, and is a prerequisite of personalized cancer treatments. Here, we use a fully integrated system for single cell RNA sequencing, to simultaneously profile the transcriptome, cell surface proteins and immune repertoire of cells from primary colorectal cancer (CRC), non-small cell lung cancer (NSCLC), and mucosa-associated lymphoid tissue (MALT) lymphoma. Each tumor varied in type and proportion of its cellular components, and in particular in the proportion of immune cells. The CRC tumor consisted of T (3% CD4+, 3% CD8+), B lymphocytes (5% CD79A+) and plasma B cells (11% IGH high). Repertoire sequencing identified a clonal expansion (>4% of B cell clonotypes) suggesting a strong B cell response in this tumor. The NSCLC tumor displayed a marked immune cell infiltration containing predominantly B cells (30% CD79A+ and 8% IGH high plasma B) with a very limited clonal expansion. The MALT lymphoma consisted of only T and B lymphocytes (31% CD4+, 8% CD8+, 57% CD79A+). In addition to the activated CD4+ and CD8+ T cells, Tfh and Treg cells were clearly identified as well as two distinct large B cell populations showing plasmacytic differentiation. Analysis of the B cell repertoire revealed a large expanded clone bearing an IGHV segment associated with parotid MALT lymphoma. These findings emphasize the importance of combining repertoire and gene expression sequencing data to determine the nature and clonality of an immune response. This method enables full characterization of tumor heterogeneity and the adaptive immune response to the TME and will be key in the development of successful immunotherapies.
Collapse
|
8
|
Marks P, Garcia S, Barrio AM, Belhocine K, Bernate J, Bharadwaj R, Bjornson K, Catalanotti C, Delaney J, Fehr A, Fiddes IT, Galvin B, Heaton H, Herschleb J, Hindson C, Holt E, Jabara CB, Jett S, Keivanfar N, Kyriazopoulou-Panagiotopoulou S, Lek M, Lin B, Lowe A, Mahamdallie S, Maheshwari S, Makarewicz T, Marshall J, Meschi F, O'Keefe CJ, Ordonez H, Patel P, Price A, Royall A, Ruark E, Seal S, Schnall-Levin M, Shah P, Stafford D, Williams S, Wu I, Xu AW, Rahman N, MacArthur D, Church DM. Resolving the full spectrum of human genome variation using Linked-Reads. Genome Res 2019; 29:635-645. [PMID: 30894395 PMCID: PMC6442396 DOI: 10.1101/gr.234443.118] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 02/21/2019] [Indexed: 02/07/2023]
Abstract
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from ∼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Adrian Fehr
- 10x Genomics, Pleasanton, California 94566, USA
| | | | | | | | | | | | - Esty Holt
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom
| | | | | | | | | | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Bill Lin
- 10x Genomics, Pleasanton, California 94566, USA
| | - Adam Lowe
- 10x Genomics, Pleasanton, California 94566, USA
| | - Shazia Mahamdallie
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom
| | | | | | - Jamie Marshall
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | - Elise Ruark
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom
| | - Sheila Seal
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom
| | | | - Preyas Shah
- 10x Genomics, Pleasanton, California 94566, USA
| | | | | | - Indira Wu
- 10x Genomics, Pleasanton, California 94566, USA
| | | | - Nazneen Rahman
- The Institute of Cancer Research, Division of Genetics and Epidemiology, London SM2 5NG, United Kingdom
| | - Daniel MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | | |
Collapse
|
9
|
Martinez Barrio A, Lamichhaney S, Fan G, Rafati N, Pettersson M, Zhang H, Dainat J, Ekman D, Höppner M, Jern P, Martin M, Nystedt B, Liu X, Chen W, Liang X, Shi C, Fu Y, Ma K, Zhan X, Feng C, Gustafson U, Rubin CJ, Sällman Almén M, Blass M, Casini M, Folkvord A, Laikre L, Ryman N, Ming-Yuen Lee S, Xu X, Andersson L. The genetic basis for ecological adaptation of the Atlantic herring revealed by genome sequencing. eLife 2016; 5. [PMID: 27138043 PMCID: PMC4854517 DOI: 10.7554/elife.12081] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 04/06/2016] [Indexed: 01/04/2023] Open
Abstract
Ecological adaptation is of major relevance to speciation and sustainable population management, but the underlying genetic factors are typically hard to study in natural populations due to genetic differentiation caused by natural selection being confounded with genetic drift in subdivided populations. Here, we use whole genome population sequencing of Atlantic and Baltic herring to reveal the underlying genetic architecture at an unprecedented detailed resolution for both adaptation to a new niche environment and timing of reproduction. We identify almost 500 independent loci associated with a recent niche expansion from marine (Atlantic Ocean) to brackish waters (Baltic Sea), and more than 100 independent loci showing genetic differentiation between spring- and autumn-spawning populations irrespective of geographic origin. Our results show that both coding and non-coding changes contribute to adaptation. Haplotype blocks, often spanning multiple genes and maintained by selection, are associated with genetic differentiation.
Collapse
Affiliation(s)
- Alvaro Martinez Barrio
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Sangeet Lamichhaney
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Guangyi Fan
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China.,BGI-Shenzhen, Shenzen, China
| | - Nima Rafati
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Mats Pettersson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - He Zhang
- BGI-Shenzhen, Shenzen, China.,College of Physics, Qingdao University, Qingdao, China
| | - Jacques Dainat
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Bioinformatics Infrastructure for Life Sciences, Uppsala University, Uppsala, Sweden
| | - Diana Ekman
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Marc Höppner
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Bioinformatics Infrastructure for Life Sciences, Uppsala University, Uppsala, Sweden
| | - Patric Jern
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Marcel Martin
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Björn Nystedt
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Xin Liu
- BGI-Shenzhen, Shenzen, China
| | | | | | | | - Yuanyuan Fu
- BGI-Shenzhen, Shenzen, China.,School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | | | - Chungang Feng
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Ulla Gustafson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Carl-Johan Rubin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Markus Sällman Almén
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Martina Blass
- Department of Aquatic Resources, Institute of Coastal Research, Swedish University of Agricultural Sciences, Öregrund, Sweden
| | - Michele Casini
- Department of Aquatic Resources, Institute of Marine Research, Swedish University of Agricultural Sciences, Lysekil, Sweden
| | - Arild Folkvord
- Department of Biology, University of Bergen, Bergen, Norway.,Hjort Center of Marine Ecosystem Dynamics, Bergen, Norway.,Institute of Marine Research, Bergen, Norway
| | - Linda Laikre
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Nils Ryman
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Simon Ming-Yuen Lee
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China
| | - Xun Xu
- BGI-Shenzhen, Shenzen, China
| | - Leif Andersson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden.,Department of Veterinary Integrative Biosciences, Texas A&M University, Texas, United States
| |
Collapse
|
10
|
Carneiro M, Rubin CJ, Di Palma F, Albert FW, Alföldi J, Martinez Barrio A, Pielberg G, Rafati N, Sayyab S, Turner-Maier J, Younis S, Afonso S, Aken B, Alves JM, Barrell D, Bolet G, Boucher S, Burbano HA, Campos R, Chang JL, Duranthon V, Fontanesi L, Garreau H, Heiman D, Johnson J, Mage RG, Peng Z, Queney G, Rogel-Gaillard C, Ruffier M, Searle S, Villafuerte R, Xiong A, Young S, Forsberg-Nilsson K, Good JM, Lander ES, Ferrand N, Lindblad-Toh K, Andersson L. Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication. Science 2014; 345:1074-1079. [PMID: 25170157 DOI: 10.1126/science.1253714] [Citation(s) in RCA: 256] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci.
Collapse
Affiliation(s)
- Miguel Carneiro
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal
| | - Carl-Johan Rubin
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Federica Di Palma
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA.,Vertebrate and Health Genomics, The Genome Analysis Center, Norwich, UK
| | - Frank W Albert
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Jessica Alföldi
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Alvaro Martinez Barrio
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Gerli Pielberg
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Nima Rafati
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Shumaila Sayyab
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jason Turner-Maier
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Shady Younis
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Department of Animal Production, Ain Shams University, Shoubra El-Kheima, Cairo, Egypt
| | - Sandra Afonso
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal
| | - Bronwen Aken
- Wellcome Trust Sanger Institute, Hinxton, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joel M Alves
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Daniel Barrell
- Wellcome Trust Sanger Institute, Hinxton, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gerard Bolet
- INRA, UMR1388 Génétique, Physiologie et Systèmes d'Elevage, F-31326 Castanet-Tolosan, France
| | | | - Hernán A Burbano
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Rita Campos
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal
| | - Jean L Chang
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Veronique Duranthon
- INRA, UMR1198 Biologie du Développement et Reproduction, F-78350 Jouy-en-Josas, France
| | - Luca Fontanesi
- Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, 40127 Bologna Italy
| | - Hervé Garreau
- INRA, UMR1388 Génétique, Physiologie et Systèmes d'Elevage, F-31326 Castanet-Tolosan, France
| | - David Heiman
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Jeremy Johnson
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Rose G Mage
- Laboratory of Immunology, NIAID, NIH, Bethesda, MD, 20892, USA
| | - Ze Peng
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 2800 Mitchell Drive, Walnut Creek, CA 94598
| | | | - Claire Rogel-Gaillard
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F- 78350, Jouy-en-Josas, France
| | - Magali Ruffier
- Wellcome Trust Sanger Institute, Hinxton, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Rafael Villafuerte
- Instituto de Estudios Sociales Avanzados, (IESA-CSIC) Campo Santo de los Mártires 7, Córdoba Spain
| | - Anqi Xiong
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Sarah Young
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Karin Forsberg-Nilsson
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Jeffrey M Good
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,Division of Biological Sciences, The University of Montana, Missoula, MT 59812, USA
| | - Eric S Lander
- Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Nuno Ferrand
- CIBIO/InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661, Vairão, Portugal.,Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre s/n. 4169-007 Porto, Portugal
| | - Kerstin Lindblad-Toh
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Broad Institute of Harvard and MIT, 7 Cambridge Center, Cambridge, MA 02142, USA
| | - Leif Andersson
- Science of Life Laboratory Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden.,Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, USA
| |
Collapse
|
11
|
Martinez Barrio A, Eriksson O, Badhai J, Fröjmark AS, Bongcam-Rudloff E, Dahl N, Schuster J. Targeted resequencing and analysis of the Diamond-Blackfan anemia disease locus RPS19. PLoS One 2009; 4:e6172. [PMID: 19587786 PMCID: PMC2703794 DOI: 10.1371/journal.pone.0006172] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Accepted: 05/27/2009] [Indexed: 11/19/2022] Open
Abstract
Background The Ribosomal protein S19 gene locus (RPS19) has been linked to two kinds of red cell aplasia, Diamond-Blackfan Anemia (DBA) and Transient Erythroblastopenia in Childhood (TEC). Mutations in RPS19 coding sequences have been found in 25% of DBA patients, but not in TEC patients. It has been suggested that non-coding RPS19 sequence variants contribute to the considerable clinical variability in red cell aplasia. We therefore aimed at identifying non-coding variations associated with DBA or TEC phenotypes. Methodology/Principal Findings We targeted a region of 19'980 bp encompassing the RPS19 gene in a cohort of 89 DBA and TEC patients for resequencing. We provide here a catalog of the considerable, previously unrecognized degree of variation in this region. We identified 73 variations (65 SNPs, 8 indels) that all are located outside of the RPS19 open reading frame, and of which 67.1% are classified as novel. We hypothesize that specific alleles in non-coding regions of RPS19 could alter the binding of regulatory proteins or transcription factors. Therefore, we carried out an extensive analysis to identify transcription factor binding sites (TFBS). A series of putative interaction sites coincide with detected variants. Sixteen of the corresponding transcription factors are of particular interest, as they are housekeeping genes or show a direct link to hematopoiesis, tumorigenesis or leukemia (e.g. GATA-1/2, PU.1, MZF-1). Conclusions Specific alleles at predicted TFBSs may alter the expression of RPS19, modify an important interaction between transcription factors with overlapping TFBS or remove an important stimulus for hematopoiesis. We suggest that the detected interactions are of importance for hematopoiesis and could provide new insights into individual response to treatment.
Collapse
Affiliation(s)
- Alvaro Martinez Barrio
- The Linnaeus Centre for Bioinformatics Uppsala University/Swedish University of Agricultural Sciences, Uppsala University, Uppsala, Sweden
| | - Oskar Eriksson
- Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
| | - Jitendra Badhai
- Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
| | - Anne-Sophie Fröjmark
- Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
| | - Erik Bongcam-Rudloff
- The Linnaeus Centre for Bioinformatics Uppsala University/Swedish University of Agricultural Sciences, Uppsala University, Uppsala, Sweden
- Department of Animal Breeding and Genetics, Uppsala University, Uppsala, Sweden
| | - Niklas Dahl
- Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
| | - Jens Schuster
- Department of Genetics and Pathology, The Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
12
|
Martinez Barrio A, Soeria-Atmadja D, Nistér A, Gustafsson MG, Hammerling U, Bongcam-Rudloff E. EVALLER: a web server for in silico assessment of potential protein allergenicity. Nucleic Acids Res 2007; 35:W694-700. [PMID: 17537818 PMCID: PMC1933222 DOI: 10.1093/nar/gkm370] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics testing approaches for protein allergenicity, involving amino acid sequence comparisons, have evolved appreciably over the last several years to increased sophistication and performance. EVALLER, the web server presented in this article is based on our recently published 'Detection based on Filtered Length-adjusted Allergen Peptides' (DFLAP) algorithm, which affords in silico determination of potential protein allergenicity of high sensitivity and excellent specificity. To strengthen bioinformatics risk assessment in allergology EVALLER provides a comprehensive outline of its judgment on a query protein's potential allergenicity. Each such textual output incorporates a scoring figure, a confidence numeral of the assignment and information on high- or low-scoring matches to identified allergen-related motifs, including their respective location in accordingly derived allergens. The interface, built on a modified Perl Open Source package, enables dynamic and color-coded graphic representation of key parts of the output. Moreover, pertinent details can be examined in great detail through zoomed views. The server can be accessed at http://bioinformatics.bmc.uu.se/evaller.html.
Collapse
Affiliation(s)
- Alvaro Martinez Barrio
- Linnaeus Centre for Bioinformatics, Uppsala Biomedical Centre (BMC), Uppsala University, P.O. Box 598, SE-751 24 Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
13
|
Abstract
We investigated the value of the polymerase chain reaction (PCR) in the diagnosis of active tuberculosis in children and evaluated the relationship between PCR results in children with tuberculous infections and mediastinal adenopathies detected by computerized tomography (CT-Scan). This was a controlled, blinded, prospective study comparing nested PCR, mycobacterial cultures and the clinical diagnosis based on 350 clinical specimens from 117 children referred for evaluation of suspected pulmonary tuberculosis. All children with tuberculous infection but without active disease underwent a chest CT-scan to detect the presence of mediastinal adenopathies not evident on chest x-ray. The sensitivity of PCR was 56.8% in children with clinically active disease (culture: 37.8%; smears: 13.5%). A major advantage of PCR over cultures was noted when there was no parenchymal involvement on chest radiograph and when the patient was undergoing anti-tuberculous treatment. There were nine specimens with false-negative PCR results due to the presence of amplification reaction inhibitors. PCR was positive in five children with tuberculous infection without active disease and these children presented mediastinal adenopathies on the CT-scan that were not evident on chest radiography. There were no false-positive PCR results in the control groups of children. We conclude that nested PCR is a rapid and sensitive method for the early diagnosis of tuberculosis in children. It is especially useful when the diagnosis of active tuberculosis is difficult. In our study children with tuberculous infection without apparent disease who have positive PCR results have mediastinal adenopathies on CT-scan.
Collapse
Affiliation(s)
- D Gomez-Pastrana
- Department of Pediatrics, Virgen del Rocío University Children's Hospital, Seville, Spain.
| | | | | | | | | | | | | |
Collapse
|