1
|
Popp NA, Powell RL, Wheelock MK, Holmes KJ, Zapp BD, Sheldon KM, Fletcher SN, Wu X, Fayer S, Rubin AF, Lannert KW, Chang AT, Sheehan JP, Johnsen JM, Fowler DM. Multiplex, multimodal mapping of variant effects in secreted proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.04.01.587474. [PMID: 39975210 PMCID: PMC11838247 DOI: 10.1101/2024.04.01.587474] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed Assays of Variant Effect (MAVEs) can measure the function of variants at scale, and are beginning to address this problem. However, MAVEs cannot readily be applied to the ~10% of human genes encoding secreted proteins. We developed a flexible, scalable human cell surface display method, Multiplexed Surface Tethering of Extracellular Proteins (MultiSTEP), to measure secreted protein variant effects. We used MultiSTEP to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease where genetic variation can cause hemophilia B. We combined MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification, measuring a total of 44,816 effects for 436 synonymous variants and 8,528 of the 8,759 possible missense variants. 49.6% of possible F9 missense variants impacted secretion, post-translational modification, or both. We also identified functional constraints on secretion within the signal peptide and for nearly all variants that caused gain or loss of cysteine. Secretion scores correlated strongly with FIX levels in hemophilia B and revealed that loss of secretion variants are particularly likely to cause severe disease. Integration of the secretion and post-translational modification scores enabled reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we showed that MultiSTEP can be applied to a wide variety of secreted proteins. Thus, MultiSTEP is a multiplexed, multimodal, and generalizable method for systematically assessing variant effects in secreted proteins at scale.
Collapse
Affiliation(s)
- Nicholas A. Popp
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Rachel L. Powell
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melinda K. Wheelock
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Kristen J. Holmes
- Division of Hematology and Oncology, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
- Center for Cardiovascular Biology, University of Washington School of Medicine, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Brendan D. Zapp
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kathryn M. Sheldon
- Division of Hematology and Oncology, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
- Center for Cardiovascular Biology, University of Washington School of Medicine, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | | | - Xiaoping Wu
- Cell Marker Laboratory, Seattle Children’s Hospital, Seattle, WA
| | - Shawn Fayer
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Alan F. Rubin
- Bioinformatics Division, WEHI, Parkville, VIC, AU
- Department of Medical Biology, University of Melbourne, Melbourne, VIC, AU
| | - Kerry W. Lannert
- Division of Hematology and Oncology, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
- Center for Cardiovascular Biology, University of Washington School of Medicine, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Alexis T. Chang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - John P. Sheehan
- Division of Hematology, Medical Oncology, and Palliative Care, Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| | - Jill M. Johnsen
- Division of Hematology and Oncology, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
- Center for Cardiovascular Biology, University of Washington School of Medicine, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest, Seattle, WA, USA
- Washington Center for Bleeding Disorders, Seattle, WA
| | - Douglas M. Fowler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Department of Bioengineering, University of Washington School of Medicine, Seattle, WA
| |
Collapse
|
2
|
Boyle GE, Sitko KA, Galloway JG, Haddox HK, Bianchi AH, Dixon A, Wheelock MK, Vandi AJ, Wang ZR, Thomson RES, Garge RK, Rettie AE, Rubin AF, Geck RC, Gillam EMJ, DeWitt WS, Matsen FA, Fowler DM. Deep mutational scanning of CYP2C19 in human cells reveals a substrate specificity-abundance tradeoff. Genetics 2024; 228:iyae156. [PMID: 39319420 PMCID: PMC11538415 DOI: 10.1093/genetics/iyae156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 08/31/2024] [Indexed: 09/26/2024] Open
Abstract
The cytochrome P450s enzyme family metabolizes ∼80% of small molecule drugs. Variants in cytochrome P450s can substantially alter drug metabolism, leading to improper dosing and severe adverse drug reactions. Due to low sequence conservation, predicting variant effects across cytochrome P450s is challenging. Even closely related cytochrome P450s like CYP2C9 and CYP2C19, which share 92% amino acid sequence identity, display distinct phenotypic properties. Using variant abundance by massively parallel sequencing, we measured the steady-state protein abundance of 7,660 single amino acid variants in CYP2C19 expressed in cultured human cells. Our findings confirmed critical positions and structural features essential for cytochrome P450 function, and revealed how variants at conserved positions influence abundance. We jointly analyzed 4,670 variants whose abundance was measured in both CYP2C19 and CYP2C9, finding that the homologs have different variant abundances in substrate recognition sites within the hydrophobic core. We also measured the abundance of all single and some multiple wild type amino acid exchanges between CYP2C19 and CYP2C9. While most exchanges had no effect, substitutions in substrate recognition site 4 reduced abundance in CYP2C19. Double and triple mutants showed distinct interactions, highlighting a region that points to differing thermodynamic properties between the 2 homologs. These positions are known contributors to substrate specificity, suggesting an evolutionary tradeoff between stability and enzymatic function. Finally, we analyzed 368 previously unannotated human variants, finding that 43% had decreased abundance. By comparing variant effects between these homologs, we uncovered regions underlying their functional differences, advancing our understanding of this versatile family of enzymes.
Collapse
Affiliation(s)
- Gabriel E Boyle
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Katherine A Sitko
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jared G Galloway
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Hugh K Haddox
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Aisha Haley Bianchi
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ajeya Dixon
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Melinda K Wheelock
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Allyssa J Vandi
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Ziyu R Wang
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Raine E S Thomson
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4067, Australia
| | - Riddhiman K Garge
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
| | - Allan E Rettie
- Department of Medicinal Chemistry, University of Washington, Seattle, WA 98195, USA
| | - Alan F Rubin
- Bioinformatics Division, Walter and Eliza Hall Institute, Parkville, VIC 3052, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Renee C Geck
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Elizabeth M J Gillam
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4067, Australia
| | - William S DeWitt
- Department of Electrical Engineering and Computer Science, University of California at Berkeley, Berkeley, CA 94720, USA
| | - Frederick A Matsen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
3
|
Weile J, Ferra G, Boyle G, Pendyala S, Amorosi C, Yeh CL, Cote AG, Kishore N, Tabet D, van Loggerenberg W, Rayhan A, Fowler DM, Dunham MJ, Roth FP. Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries. Bioinformatics 2024; 40:btae182. [PMID: 38569896 PMCID: PMC11021806 DOI: 10.1093/bioinformatics/btae182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/05/2024] [Accepted: 04/02/2024] [Indexed: 04/05/2024] Open
Abstract
MOTIVATION Long-read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. RESULTS Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or nonunique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues. AVAILABILITY AND IMPLEMENTATION Pacybara, freely available at https://github.com/rothlab/pacybara, is implemented using R, Python, and bash for Linux. It runs on GNU/Linux HPC clusters via Slurm, PBS, or GridEngine schedulers. A single-machine simplex version is also available.
Collapse
Affiliation(s)
- Jochen Weile
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - Gabrielle Ferra
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Gabriel Boyle
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Sriram Pendyala
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Clara Amorosi
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Chiann-Ling Yeh
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Atina G Cote
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - Nishka Kishore
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - Daniel Tabet
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - Warren van Loggerenberg
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
- Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Ashyad Rayhan
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
- Department of Bioengineering, University of Washington, Seattle, WA 98195, United States
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, United States
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Frederick P Roth
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON M5S 2E4, Canada
- Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, United States
| |
Collapse
|
4
|
Kamath ND, Matreyek KA. Multiplex Functional Characterization of Protein Variant Libraries in Mammalian Cells with Single-Copy Genomic Integration and High-Throughput DNA Sequencing. Methods Mol Biol 2024; 2774:135-152. [PMID: 38441763 DOI: 10.1007/978-1-0716-3718-0_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Sequencing-based, massively parallel genetic assays have enabled simultaneous characterization of the genotype-phenotype relationships for libraries encoding thousands of unique protein variants. Since plasmid transfection and lentiviral transduction have characteristics that limit multiplexing with pooled libraries, we developed a mammalian synthetic biology platform that harnesses the Bxb1 bacteriophage DNA recombinase to insert single promoterless plasmids encoding a transgene of interest into a pre-engineered "landing pad" site within the cell genome. The transgene is expressed behind a genomically integrated promoter, ensuring only one transgene is expressed per cell, preserving a strict genotype-phenotype link. Upon selecting cells based on a desired phenotype, the transgene can be sequenced to ascribe each variant a phenotypic score. We describe how to create and utilize landing pad cells for large-scale, library-based genetic experiments. Using the provided examples, the experimental template can be adapted to explore protein variants in diverse biological problems within mammalian cells.
Collapse
Affiliation(s)
- Nisha D Kamath
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Kenneth A Matreyek
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH, USA.
| |
Collapse
|
5
|
Weile J, Ferra G, Boyle G, Pendyala S, Amorosi C, Yeh CL, Cote AG, Kishore N, Tabet D, van Loggerenberg W, Rayhan A, Fowler DM, Dunham MJ, Roth FP. Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.22.529427. [PMID: 36865234 PMCID: PMC9980134 DOI: 10.1101/2023.02.22.529427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Long read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.
Collapse
Affiliation(s)
- Jochen Weile
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
| | - Gabrielle Ferra
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Gabriel Boyle
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sriram Pendyala
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Clara Amorosi
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chiann-Ling Yeh
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Atina G Cote
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
| | - Nishka Kishore
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
| | - Daniel Tabet
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
| | - Warren van Loggerenberg
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
| | - Ashyad Rayhan
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
| | - Douglas M Fowler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Frederick P Roth
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, M5G 1X5, Canada
- Donnelly Centre, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4
- Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
6
|
Kijima Y, Evans-Yamamoto D, Toyoshima H, Yachie N. A universal sequencing read interpreter. SCIENCE ADVANCES 2023; 9:eadd2793. [PMID: 36598975 PMCID: PMC9812397 DOI: 10.1126/sciadv.add2793] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Massively parallel DNA sequencing has led to the rapid growth of highly multiplexed experiments in biology. These experiments produce unique sequencing results that require specific analysis pipelines to decode highly structured reads. However, no versatile framework that interprets sequencing reads to extract their encoded information for downstream biological analysis has been developed. Here, we report INTERSTELLAR (interpretation, scalable transformation, and emulation of large-scale sequencing reads) that decodes data values encoded in theoretically any type of sequencing read and translates them into sequencing reads of another structure of choice. We demonstrated that INTERSTELLAR successfully extracted information from a range of short- and long-read sequencing reads and translated those of single-cell (sc)RNA-seq, scATAC-seq, and spatial transcriptomics to be analyzed by different software tools that have been developed for conceptually the same types of experiments. INTERSTELLAR will greatly facilitate the development of sequencing-based experiments and sharing of data analysis pipelines.
Collapse
Affiliation(s)
- Yusuke Kijima
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
- Department of Aquatic Bioscience, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
| | - Daniel Evans-Yamamoto
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
- Institute for Advanced Biosciences, Keio University, Tsuruoka 997-0035, Japan
| | - Hiromi Toyoshima
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
| | - Nozomu Yachie
- School of Biomedical Engineering, Faculty of Applied Science and Faculty of Medicine, The University of British Columbia, Vancouver, BC V6T 1Z3, Canada
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan
- Twitter: @yachielab
| |
Collapse
|