1
|
Mordig M, Rätsch G, Kahles A. SimReadUntil for benchmarking selective sequencing algorithms on ONT devices. Bioinformatics 2024; 40:btae199. [PMID: 38603597 PMCID: PMC11065473 DOI: 10.1093/bioinformatics/btae199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/02/2024] [Accepted: 04/09/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION The Oxford Nanopore Technologies (ONT) ReadUntil API enables selective sequencing, which aims to selectively favor interesting over uninteresting reads, e.g. to deplete or enrich certain genomic regions. The performance gain depends on the selective sequencing decision-making algorithm (SSDA) which decides whether to reject a read, stop receiving a read, or wait for more data. Since real runs are time-consuming and costly, simulating the ONT sequencer with support for the ReadUntil API is highly beneficial for comparing and optimizing new SSDAs. Existing software like MinKNOW and UNCALLED only return raw signal data, are memory-intensive, require huge and often unavailable multi-fast5 files (≥100GB) and are not clearly documented. RESULTS We present the ONT device simulator SimReadUntil that takes a set of full reads as input, distributes them to channels and plays them back in real time including mux scans, channel gaps and blockages, and allows to reject reads as well as stop receiving data from them. Our modified ReadUntil API provides the basecalled reads rather than the raw signal, reducing computational load and focusing on the SSDA rather than on basecalling. Tuning the parameters of tools like ReadFish and ReadBouncer becomes easier because a GPU for basecalling is no longer required. We offer various methods to extract simulation parameters from a sequencing summary file and adapt ReadFish to replicate one of their enrichment experiments. SimReadUntil's gRPC interface allows standardized interaction with a wide range of programming languages. AVAILABILITY AND IMPLEMENTATION Code and fully worked examples are available on GitHub (https://github.com/ratschlab/sim_read_until).
Collapse
Affiliation(s)
- Maximilian Mordig
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, 72076, Germany
| | - Gunnar Rätsch
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zürich, 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
- Department of Biology, ETH Zurich, Zürich, 8092, Switzerland
| | - André Kahles
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zürich, 8092, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zürich, 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| |
Collapse
|
2
|
Joudaki A, Meterez A, Mustafa H, Groot Koerkamp R, Kahles A, Rätsch G. Aligning distant sequences to graphs using long seed sketches. Genome Res 2023; 33:1208-1217. [PMID: 37072187 PMCID: PMC10538362 DOI: 10.1101/gr.277659.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/16/2023] [Indexed: 04/20/2023]
Abstract
Sequence-to-graph alignment is crucial for applications such as variant genotyping, read error correction, and genome assembly. We propose a novel seeding approach that relies on long inexact matches rather than short exact matches, and show that it yields a better time-accuracy trade-off in settings with up to a [Formula: see text] mutation rate. We use sketches of a subset of graph nodes, which are more robust to indels, and store them in a k-nearest neighbor index to avoid the curse of dimensionality. Our approach contrasts with existing methods and highlights the important role that sketching into vector space can play in bioinformatics applications. We show that our method scales to graphs with 1 billion nodes and has quasi-logarithmic query time for queries with an edit distance of [Formula: see text] For such queries, longer sketch-based seeds yield a [Formula: see text] increase in recall compared with exact seeds. Our approach can be incorporated into other aligners, providing a novel direction for sequence-to-graph alignment.
Collapse
Affiliation(s)
- Amir Joudaki
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
| | - Alexandru Meterez
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
| | - Harun Mustafa
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | | | - André Kahles
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland;
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland
- University Hospital Zurich, Biomedical Informatics Research, Zurich 8091, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- ETH AI Center, 8092 Zurich, Switzerland
| |
Collapse
|
3
|
Eichhoff OM, Stoffel CI, Käsler J, Briker L, Turko P, Karsai G, Zila N, Paulitschke V, Cheng PF, Leitner A, Bileck A, Zamboni N, Irmisch A, Balazs Z, Tastanova A, Pascoal S, Johansen P, Wegmann R, Mena J, Othman A, Viswanathan VS, Wenzina J, Aloia A, Saltari A, Dzung A, Aebersold R, Ak M, Al-Quaddoomi FS, Albert SI, Albinus J, Alborelli I, Andani S, Attinger PO, Bacac M, Baumhoer D, Beck-Schimmer B, Beerenwinkel N, Beisel C, Bernasconi L, Bertolini A, Bodenmiller B, Bonilla X, Bosshard L, Calgua B, Casanova R, Chevrier S, Chicherova N, Coelho R, D'Costa M, Danenberg E, Davidson N, Drãgan MA, Dummer R, Engler S, Erkens M, Eschbach K, Esposito C, Fedier A, Ferreira P, Ficek J, Frei AL, Frey B, Goetze S, Grob L, Gut G, Günther D, Haberecker M, Haeuptle P, Heinzelmann-Schwarz V, Herter S, Holtackers R, Huesser T, Immer A, Irmisch A, Jacob F, Jacobs A, Jaeger TM, Jahn K, James AR, Jermann PM, Kahles A, Kahraman A, Koelzer VH, Kuebler W, Kuipers J, Kunze CP, Kurzeder C, Lehmann KV, Levesque M, Lischetti U, Lugert S, Maass G, Manz MG, Markolin P, Mehnert M, Mena J, Metzler JM, Miglino N, Milani ES, Moch H, Muenst S, Murri R, Ng CK, Nicolet S, Nowak M, Lopez MN, Pedrioli PG, Pelkmans L, Piscuoglio S, Prummer M, Rimmer N, Ritter M, Rommel C, Rosano-González ML, Rätsch G, Santacroce N, Del Castillo JS, Schlenker R, Schwalie PC, Schwan S, Schär T, Senti G, Shao W, Singer F, Sivapatham S, Snijder B, Sobottka B, Sreedharan VT, Stark S, Stekhoven DJ, Tanna T, Theocharides AP, Thomas TM, Tolnay M, Tosevski V, Toussaint NC, Tuncel MA, Tusup M, Van Drogen A, Vetter M, Vlajnic T, Weber S, Weber WP, Wegmann R, Weller M, Wendt F, Wey N, Wicki A, Wildschut MH, Wollscheid B, Yu S, Ziegler J, Zimmermann M, Zoche M, Zuend G, Krauthammer M, Schreiber SL, Hornemann T, Distel M, Snijder B, Dummer R, Levesque MP. ROS Induction Targets Persister Cancer Cells with Low Metabolic Activity in NRAS-Mutated Melanoma. Cancer Res 2023; 83:1128-1146. [PMID: 36946761 DOI: 10.1158/0008-5472.can-22-1826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 10/04/2022] [Accepted: 01/24/2023] [Indexed: 03/23/2023]
Abstract
Clinical management of melanomas with NRAS mutations is challenging. Targeting MAPK signaling is only beneficial to a small subset of patients due to resistance that arises through genetic, transcriptional, and metabolic adaptation. Identification of targetable vulnerabilities in NRAS-mutated melanoma could help improve patient treatment. Here, we used multiomics analyses to reveal that NRAS-mutated melanoma cells adopt a mesenchymal phenotype with a quiescent metabolic program to resist cellular stress induced by MEK inhibition. The metabolic alterations elevated baseline reactive oxygen species (ROS) levels, leading these cells to become highly sensitive to ROS induction. In vivo xenograft experiments and single-cell RNA sequencing demonstrated that intratumor heterogeneity necessitates the combination of a ROS inducer and a MEK inhibitor to inhibit both tumor growth and metastasis. Ex vivo pharmacoscopy of 62 human metastatic melanomas confirmed that MEK inhibitor-resistant tumors significantly benefited from the combination therapy. Finally, oxidative stress response and translational suppression corresponded with ROS-inducer sensitivity in 486 cancer cell lines, independent of cancer type. These findings link transcriptional plasticity to a metabolic phenotype that can be inhibited by ROS inducers in melanoma and other cancers. SIGNIFICANCE Metabolic reprogramming in drug-resistant NRAS-mutated melanoma cells confers sensitivity to ROS induction, which suppresses tumor growth and metastasis in combination with MAPK pathway inhibitors.
Collapse
Affiliation(s)
- Ossia M Eichhoff
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Corinne I Stoffel
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Jan Käsler
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Luzia Briker
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Patrick Turko
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Gergely Karsai
- Institute for Clinical Chemistry, University Hospital Zurich, Zurich, Switzerland; Zurich Center for Integrative Human Physiology (ZIHP), University of Zurich, Zurich, Switzerland
| | - Nina Zila
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Verena Paulitschke
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Phil F Cheng
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | | | - Andrea Bileck
- Joint Metabolome Facility, Faculty of Chemistry, University of Vienna, Vienna, Austria
- Department of Analytical Chemistry, University of Vienna, Vienna, Austria
| | - Nicola Zamboni
- Institute for Molecular Systems Biology, ETH Zurich, Switzerland
| | - Anja Irmisch
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Zsolt Balazs
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Biomedical Informatics, University Hospital of Zurich, Zurich, Switzerland
| | - Aizhan Tastanova
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Susana Pascoal
- St. Anna Children's Cancer Research Institute, Vienna, Austria
| | - Pål Johansen
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Rebekka Wegmann
- Institute for Molecular Systems Biology, ETH Zurich, Switzerland
| | - Julien Mena
- Institute for Molecular Systems Biology, ETH Zurich, Switzerland
| | - Alaa Othman
- Institute for Molecular Systems Biology, ETH Zurich, Switzerland
| | | | - Judith Wenzina
- Skin and Endothelium Research Division, Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Andrea Aloia
- Institute for Molecular Systems Biology, ETH Zurich, Switzerland
| | - Annalisa Saltari
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Andreas Dzung
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Michael Krauthammer
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
- Biomedical Informatics, University Hospital of Zurich, Zurich, Switzerland
| | | | - Thorsten Hornemann
- Institute for Clinical Chemistry, University Hospital Zurich, Zurich, Switzerland; Zurich Center for Integrative Human Physiology (ZIHP), University of Zurich, Zurich, Switzerland
| | - Martin Distel
- St. Anna Children's Cancer Research Institute, Vienna, Austria
| | - Berend Snijder
- Institute for Molecular Systems Biology, ETH Zurich, Switzerland
| | - Reinhard Dummer
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| | - Mitchell P Levesque
- Department of Dermatology, University of Zurich, University Hospital Zurich, Zurich, Switzerland
| |
Collapse
|
4
|
Calabrese C, Davidson NR, Demircioğlu D, Fonseca NA, He Y, Kahles A, Lehmann KV, Liu F, Shiraishi Y, Soulette CM, Urban L, Greger L, Li S, Liu D, Perry MD, Xiang Q, Zhang F, Zhang J, Bailey P, Erkek S, Hoadley KA, Hou Y, Huska MR, Kilpinen H, Korbel JO, Marin MG, Markowski J, Nandi T, Pan-Hammarström Q, Pedamallu CS, Siebert R, Stark SG, Su H, Tan P, Waszak SM, Yung C, Zhu S, Awadalla P, Creighton CJ, Meyerson M, Ouellette BFF, Wu K, Yang H, Brazma A, Brooks AN, Göke J, Rätsch G, Schwarz RF, Stegle O, Zhang Z, Wu K, Yang H, Fonseca NA, Kahles A, Lehmann KV, Urban L, Soulette CM, Shiraishi Y, Liu F, He Y, Demircioğlu D, Davidson NR, Calabrese C, Zhang J, Perry MD, Xiang Q, Greger L, Li S, Liu D, Stark SG, Zhang F, Amin SB, Bailey P, Chateigner A, Cortés-Ciriano I, Craft B, Erkek S, Frenkel-Morgenstern M, Goldman M, Hoadley KA, Hou Y, Huska MR, Khurana E, Kilpinen H, Korbel JO, Lamaze FC, Li C, Li X, Li X, Liu X, Marin MG, Markowski J, Nandi T, Nielsen MM, Ojesina AI, Pan-Hammarström Q, Park PJ, Pedamallu CS, Pedersen JS, Pederzoli P, Peifer M, Pennell NA, Perou CM, Perry MD, Petersen GM, Peto M, Petrelli N, Pedamallu CS, Petryszak R, Pfister SM, Phillips M, Pich O, Pickett HA, Pihl TD, Pillay N, Pinder S, Pinese M, Pinho AV, Pedersen JS, Pitkänen E, Pivot X, Piñeiro-Yáñez E, Planko L, Plass C, Polak P, Pons T, Popescu I, Potapova O, Prasad A, Siebert R, Preston SR, Prinz M, Pritchard AL, Prokopec SD, Provenzano E, Puente XS, Puig S, Puiggròs M, Pulido-Tamayo S, Pupo GM, Su H, Purdie CA, Quinn MC, Rabionet R, Rader JS, Radlwimmer B, Radovic P, Raeder B, Raine KM, Ramakrishna M, Ramakrishnan K, Tan P, Ramalingam S, Raphael BJ, Rathmell WK, Rausch T, Reifenberger G, Reimand J, Reis-Filho J, Reuter V, Reyes-Salazar I, Reyna MA, Teh BT, Reynolds SM, Rheinbay E, Riazalhosseini Y, Richardson AL, Richter J, Ringel M, Ringnér M, Rino Y, Rippe K, Roach J, Wang J, Roberts LR, Roberts ND, Roberts SA, Robertson AG, Robertson AJ, Rodriguez JB, Rodriguez-Martin B, Rodríguez-González FG, Roehrl MHA, Rohde M, Waszak SM, Rokutan H, Romieu G, Rooman I, Roques T, Rosebrock D, Rosenberg M, Rosenstiel PC, Rosenwald A, Rowe EW, Royo R, Xiong H, Rozen SG, Rubanova Y, Rubin MA, Rubio-Perez C, Rudneva VA, Rusev BC, Ruzzenente A, Rätsch G, Sabarinathan R, Sabelnykova VY, Yakneen S, Sadeghi S, Sahinalp SC, Saini N, Saito-Adachi M, Saksena G, Salcedo A, Salgado R, Salichos L, Sallari R, Saller C, Ye C, Salvia R, Sam M, Samra JS, Sanchez-Vega F, Sander C, Sanders G, Sarin R, Sarrafi I, Sasaki-Oku A, Sauer T, Yung C, Sauter G, Saw RPM, Scardoni M, Scarlett CJ, Scarpa A, Scelo G, Schadendorf D, Schein JE, Schilhabel MB, Schlesner M, Zhang X, Schlomm T, Schmidt HK, Schramm SJ, Schreiber S, Schultz N, Schumacher SE, Schwarz RF, Scolyer RA, Scott D, Scully R, Zheng L, Seethala R, Segre AV, Selander I, Semple CA, Senbabaoglu Y, Sengupta S, Sereni E, Serra S, Sgroi DC, Shackleton M, Zhu J, Shah NC, Shahabi S, Shang CA, Shang P, Shapira O, Shelton T, Shen C, Shen H, Shepherd R, Shi R, Zhu S, Shi Y, Shiah YJ, Shibata T, Shih J, Shimizu E, Shimizu K, Shin SJ, Shiraishi Y, Shmaya T, Shmulevich I, Awadalla P, Shorser SI, Short C, Shrestha R, Shringarpure SS, Shriver C, Shuai S, Sidiropoulos N, Siebert R, Sieuwerts AM, Sieverling L, Creighton CJ, Signoretti S, Sikora KO, Simbolo M, Simon R, Simons JV, Simpson JT, Simpson PT, Singer S, Sinnott-Armstrong N, Sipahimalani P, Meyerson M, Skelly TJ, Smid M, Smith J, Smith-McCune K, Socci ND, Sofia HJ, Soloway MG, Song L, Sood AK, Sothi S, Ouellette BFF, Sotiriou C, Soulette CM, Span PN, Spellman PT, Sperandio N, Spillane AJ, Spiro O, Spring J, Staaf J, Stadler PF, Wu K, Staib P, Stark SG, Stebbings L, Stefánsson ÓA, Stegle O, Stein LD, Stenhouse A, Stewart C, Stilgenbauer S, Stobbe MD, Yang H, Stratton MR, Stretch JR, Struck AJ, Stuart JM, Stunnenberg HG, Su H, Su X, Sun RX, Sungalee S, Susak H, Göke J, Suzuki A, Sweep F, Szczepanowski M, Sültmann H, Yugawa T, Tam A, Tamborero D, Tan BKT, Tan D, Tan P, Schwarz RF, Tanaka H, Taniguchi H, Tanskanen TJ, Tarabichi M, Tarnuzzer R, Tarpey P, Taschuk ML, Tatsuno K, Tavaré S, Taylor DF, Stegle O, Taylor-Weiner A, Teague JW, Teh BT, Tembe V, Temes J, Thai K, Thayer SP, Thiessen N, Thomas G, Thomas S, Zhang Z, Thompson A, Thompson AM, Thompson JFF, Thompson RH, Thorne H, Thorne LB, Thorogood A, Tiao G, Tijanic N, Timms LE, Brazma A, Tirabosco R, Tojo M, Tommasi S, Toon CW, Toprak UH, Torrents D, Tortora G, Tost J, Totoki Y, Townend D, Rätsch G, Traficante N, Treilleux I, Trotta JR, Trümper LHP, Tsao M, Tsunoda T, Tubio JMC, Tucker O, Turkington R, Turner DJ, Brooks AN, Tutt A, Ueno M, Ueno NT, Umbricht C, Umer HM, Underwood TJ, Urban L, Urushidate T, Ushiku T, Uusküla-Reimand L, Brazma A, Valencia A, Van Den Berg DJ, Van Laere S, Van Loo P, Van Meir EG, Van den Eynden GG, Van der Kwast T, Vasudev N, Vazquez M, Vedururu R, Brooks AN, Veluvolu U, Vembu S, Verbeke LPC, Vermeulen P, Verrill C, Viari A, Vicente D, Vicentini C, VijayRaghavan K, Viksna J, Göke J, Vilain RE, Villasante I, Vincent-Salomon A, Visakorpi T, Voet D, Vyas P, Vázquez-García I, Waddell NM, Waddell N, Wadelius C, Rätsch G, Wadi L, Wagener R, Wala JA, Wang J, Wang J, Wang L, Wang Q, Wang W, Wang Y, Wang Z, Schwarz RF, Waring PM, Warnatz HJ, Warrell J, Warren AY, Waszak SM, Wedge DC, Weichenhan D, Weinberger P, Weinstein JN, Weischenfeldt J, Stegle O, Weisenberger DJ, Welch I, Wendl MC, Werner J, Whalley JP, Wheeler DA, Whitaker HC, Wigle D, Wilkerson MD, Williams A, Zhang Z, Wilmott JS, Wilson GW, Wilson JM, Wilson RK, Winterhoff B, Wintersinger JA, Wiznerowicz M, Wolf S, Wong BH, Wong T, Aaltonen LA, Wong W, Woo Y, Wood S, Wouters BG, Wright AJ, Wright DW, Wright MH, Wu CL, Wu DY, Wu G, Abascal F, Wu J, Wu K, Wu Y, Wu Z, Xi L, Xia T, Xiang Q, Xiao X, Xing R, Xiong H, Abeshouse A, Xu Q, Xu Y, Xue H, Yachida S, Yakneen S, Yamaguchi R, Yamaguchi TN, Yamamoto M, Yamamoto S, Yamaue H, Aburatani H, Yang F, Yang H, Yang JY, Yang L, Yang L, Yang S, Yang TP, Yang Y, Yao X, Yaspo ML, Adams DJ, Yates L, Yau C, Ye C, Ye K, Yellapantula VD, Yoon CJ, Yoon SS, Yousif F, Yu J, Yu K, Agrawal N, Yu W, Yu Y, Yuan K, Yuan Y, Yuen D, Yung CK, Zaikova O, Zamora J, Zapatka M, Zenklusen JC, Ahn KS, Zenz T, Zeps N, Zhang CZ, Zhang F, Zhang H, Zhang H, Zhang H, Zhang J, Zhang J, Zhang J, Ahn SM, Zhang X, Zhang X, Zhang Y, Zhang Z, Zhao Z, Zheng L, Zheng X, Zhou W, Zhou Y, Zhu B, Aikata H, Zhu H, Zhu J, Zhu S, Zou L, Zou X, deFazio A, van As N, van Deurzen CHM, van de Vijver MJ, van’t Veer L, Akbani R, von Mering C, Akdemir KC, Al-Ahmadie H, Al-Sedairy ST, Al-Shahrour F, Alawi M, Albert M, Aldape K, Alexandrov LB, Ally A, Alsop K, Alvarez EG, Amary F, Amin SB, Aminou B, Ammerpohl O, Anderson MJ, Ang Y, Antonello D, Anur P, Aparicio S, Appelbaum EL, Arai Y, Aretz A, Arihiro K, Ariizumi SI, Armenia J, Arnould L, Asa S, Assenov Y, Atwal G, Aukema S, Auman JT, Aure MRR, Awadalla P, Aymerich M, Bader GD, Baez-Ortega A, Bailey MH, Bailey PJ, Balasundaram M, Balu S, Bandopadhayay P, Banks RE, Barbi S, Barbour AP, Barenboim J, Barnholtz-Sloan J, Barr H, Barrera E, Bartlett J, Bartolome J, Bassi C, Bathe OF, Baumhoer D, Bavi P, Baylin SB, Bazant W, Beardsmore D, Beck TA, Behjati S, Behren A, Niu B, Bell C, Beltran S, Benz C, Berchuck A, Bergmann AK, Bergstrom EN, Berman BP, Berney DM, Bernhart SH, Beroukhim R, Berrios M, Bersani S, Bertl J, Betancourt M, Bhandari V, Bhosle SG, Biankin AV, Bieg M, Bigner D, Binder H, Birney E, Birrer M, Biswas NK, Bjerkehagen B, Bodenheimer T, Boice L, Bonizzato G, De Bono JS, Boot A, Bootwalla MS, Borg A, Borkhardt A, Boroevich KA, Borozan I, Borst C, Bosenberg M, Bosio M, Boultwood J, Bourque G, Boutros PC, Bova GS, Bowen DT, Bowlby R, Bowtell DDL, Boyault S, Boyce R, Boyd J, Brazma A, Brennan P, Brewer DS, Brinkman AB, Bristow RG, Broaddus RR, Brock JE, Brock M, Broeks A, Brooks AN, Brooks D, Brors B, Brunak S, Bruxner TJC, Bruzos AL, Buchanan A, Buchhalter I, Buchholz C, Bullman S, Burke H, Burkhardt B, Burns KH, Busanovich J, Bustamante CD, Butler AP, Butte AJ, Byrne NJ, Børresen-Dale AL, Caesar-Johnson SJ, Cafferkey A, Cahill D, Calabrese C, Caldas C, Calvo F, Camacho N, Campbell PJ, Campo E, Cantù C, Cao S, Carey TE, Carlevaro-Fita J, Carlsen R, Cataldo I, Cazzola M, Cebon J, Cerfolio R, Chadwick DE, Chakravarty D, Chalmers D, Chan CWY, Chan K, Chan-Seng-Yue M, Chandan VS, Chang DK, Chanock SJ, Chantrill LA, Chateigner A, Chatterjee N, Chayama K, Chen HW, Chen J, Chen K, Chen Y, Chen Z, Cherniack AD, Chien J, Chiew YE, Chin SF, Cho J, Cho S, Choi JK, Choi W, Chomienne C, Chong Z, Choo SP, Chou A, Christ AN, Christie EL, Chuah E, Cibulskis C, Cibulskis K, Cingarlini S, Clapham P, Claviez A, Cleary S, Cloonan N, Cmero M, Collins CC, Connor AA, Cooke SL, Cooper CS, Cope L, Corbo V, Cordes MG, Cordner SM, Cortés-Ciriano I, Covington K, Cowin PA, Craft B, Craft D, Creighton CJ, Cun Y, Curley E, Cutcutache I, Czajka K, Czerniak B, Dagg RA, Danilova L, Davi MV, Davidson NR, Davies H, Davis IJ, Davis-Dusenbery BN, Dawson KJ, De La Vega FM, De Paoli-Iseppi R, Defreitas T, Tos APD, Delaneau O, Demchok JA, Demeulemeester J, Demidov GM, Demircioğlu D, Dennis NM, Denroche RE, Dentro SC, Desai N, Deshpande V, Deshwar AG, Desmedt C, Deu-Pons J, Dhalla N, Dhani NC, Dhingra P, Dhir R, DiBiase A, Diamanti K, Ding L, Ding S, Dinh HQ, Dirix L, Doddapaneni H, Donmez N, Dow MT, Drapkin R, Drechsel O, Drews RM, Serge S, Dudderidge T, Dueso-Barroso A, Dunford AJ, Dunn M, Dursi LJ, Duthie FR, Dutton-Regester K, Eagles J, Easton DF, Edmonds S, Edwards PA, Edwards SE, Eeles RA, Ehinger A, Eils J, Eils R, El-Naggar A, Eldridge M, Ellrott K, Erkek S, Escaramis G, Espiritu SMG, Estivill X, Etemadmoghadam D, Eyfjord JE, Faltas BM, Fan D, Fan Y, Faquin WC, Farcas C, Fassan M, Fatima A, Favero F, Fayzullaev N, Felau I, Fereday S, Ferguson ML, Ferretti V, Feuerbach L, Field MA, Fink JL, Finocchiaro G, Fisher C, Fittall MW, Fitzgerald A, Fitzgerald RC, Flanagan AM, Fleshner NE, Flicek P, Foekens JA, Fong KM, Fonseca NA, Foster CS, Fox NS, Fraser M, Frazer S, Frenkel-Morgenstern M, Friedman W, Frigola J, Fronick CC, Fujimoto A, Fujita M, Fukayama M, Fulton LA, Fulton RS, Furuta M, Futreal PA, Füllgrabe A, Gabriel SB, Gallinger S, Gambacorti-Passerini C, Gao J, Gao S, Garraway L, Garred Ø, Garrison E, Garsed DW, Gehlenborg N, Gelpi JLL, George J, Gerhard DS, Gerhauser C, Gershenwald JE, Gerstein M, Gerstung M, Getz G, Ghori M, Ghossein R, Giama NH, Gibbs RA, Gibson B, Gill AJ, Gill P, Giri DD, Glodzik D, Gnanapragasam VJ, Goebler ME, Goldman MJ, Gomez C, Gonzalez S, Gonzalez-Perez A, Gordenin DA, Gossage J, Gotoh K, Govindan R, Grabau D, Graham JS, Grant RC, Green AR, Green E, Greger L, Grehan N, Grimaldi S, Grimmond SM, Grossman RL, Grundhoff A, Gundem G, Guo Q, Gupta M, Gupta S, Gut IG, Gut M, Göke J, Ha G, Haake A, Haan D, Haas S, Haase K, Haber JE, Habermann N, Hach F, Haider S, Hama N, Hamdy FC, Hamilton A, Hamilton MP, Han L, Hanna GB, Hansmann M, Haradhvala NJ, Harismendy O, Harliwong I, Harmanci AO, Harrington E, Hasegawa T, Haussler D, Hawkins S, Hayami S, Hayashi S, Hayes DN, Hayes SJ, Hayward NK, Hazell S, He Y, Heath AP, Heath SC, Hedley D, Hegde AM, Heiman DI, Heinold MC, Heins Z, Heisler LE, Hellstrom-Lindberg E, Helmy M, Heo SG, Hepperla AJ, Heredia-Genestar JM, Herrmann C, Hersey P, Hess JM, Hilmarsdottir H, Hinton J, Hirano S, Hiraoka N, Hoadley KA, Hobolth A, Hodzic E, Hoell JI, Hoffmann S, Hofmann O, Holbrook A, Holik AZ, Hollingsworth MA, Holmes O, Holt RA, Hong C, Hong EP, Hong JH, Hooijer GK, Hornshøj H, Hosoda F, Hou Y, Hovestadt V, Howat W, Hoyle AP, Hruban RH, Hu J, Hu T, Hua X, Huang KL, Huang M, Huang MN, Huang V, Huang Y, Huber W, Hudson TJ, Hummel M, Hung JA, Huntsman D, Hupp TR, Huse J, Huska MR, Hutter B, Hutter CM, Hübschmann D, Iacobuzio-Donahue CA, Imbusch CD, Imielinski M, Imoto S, Isaacs WB, Isaev K, Ishikawa S, Iskar M, Islam SMA, Ittmann M, Ivkovic S, Izarzugaza JMG, Jacquemier J, Jakrot V, Jamieson NB, Jang GH, Jang SJ, Jayaseelan JC, Jayasinghe R, Jefferys SR, Jegalian K, Jennings JL, Jeon SH, Jerman L, Ji Y, Jiao W, Johansson PA, Johns AL, Johns J, Johnson R, Johnson TA, Jolly C, Joly Y, Jonasson JG, Jones CD, Jones DR, Jones DTW, Jones N, Jones SJM, Jonkers J, Ju YS, Juhl H, Jung J, Juul M, Juul RI, Juul S, Jäger N, Kabbe R, Kahles A, Kahraman A, Kaiser VB, Kakavand H, Kalimuthu S, von Kalle C, Kang KJ, Karaszi K, Karlan B, Karlić R, Karsch D, Kasaian K, Kassahn KS, Katai H, Kato M, Katoh H, Kawakami Y, Kay JD, Kazakoff SH, Kazanov MD, Keays M, Kebebew E, Kefford RF, Kellis M, Kench JG, Kennedy CJ, Kerssemakers JNA, Khoo D, Khoo V, Khuntikeo N, Khurana E, Kilpinen H, Kim HK, Kim HL, Kim HY, Kim H, Kim J, Kim J, Kim JK, Kim Y, King TA, Klapper W, Kleinheinz K, Klimczak LJ, Knappskog S, Kneba M, Knoppers BM, Koh Y, Komorowski J, Komura D, Komura M, Kong G, Kool M, Korbel JO, Korchina V, Korshunov A, Koscher M, Koster R, Kote-Jarai Z, Koures A, Kovacevic M, Kremeyer B, Kretzmer H, Kreuz M, Krishnamurthy S, Kube D, Kumar K, Kumar P, Kumar S, Kumar Y, Kundra R, Kübler K, Küppers R, Lagergren J, Lai PH, Laird PW, Lakhani SR, Lalansingh CM, Lalonde E, Lamaze FC, Lambert A, Lander E, Landgraf P, Landoni L, Langerød A, Lanzós A, Larsimont D, Larsson E, Lathrop M, Lau LMS, Lawerenz C, Lawlor RT, Lawrence MS, Lazar AJ, Lazic AM, Le X, Lee D, Lee D, Lee EA, Lee HJ, Lee JJK, Lee JY, Lee J, Lee MTM, Lee-Six H, Lehmann KV, Lehrach H, Lenze D, Leonard CR, Leongamornlert DA, Leshchiner I, Letourneau L, Letunic I, Levine DA, Lewis L, Ley T, Li C, Li CH, Li HI, Li J, Li L, Li S, Li S, Li X, Li X, Li X, Li Y, Liang H, Liang SB, Lichter P, Lin P, Lin Z, Linehan WM, Lingjærde OC, Liu D, Liu EM, Liu FFF, Liu F, Liu J, Liu X, Livingstone J, Livitz D, Livni N, Lochovsky L, Loeffler M, Long GV, Lopez-Guillermo A, Lou S, Louis DN, Lovat LB, Lu Y, Lu YJ, Lu Y, Luchini C, Lungu I, Luo X, Luxton HJ, Lynch AG, Lype L, López C, López-Otín C, Ma EZ, Ma Y, MacGrogan G, MacRae S, Macintyre G, Madsen T, Maejima K, Mafficini A, Maglinte DT, Maitra A, Majumder PP, Malcovati L, Malikic S, Malleo G, Mann GJ, Mantovani-Löffler L, Marchal K, Marchegiani G, Mardis ER, Margolin AA, Marin MG, Markowetz F, Markowski J, Marks J, Marques-Bonet T, Marra MA, Marsden L, Martens JWM, Martin S, Martin-Subero JI, Martincorena I, Martinez-Fundichely A, Maruvka YE, Mashl RJ, Massie CE, Matthew TJ, Matthews L, Mayer E, Mayes S, Mayo M, Mbabaali F, McCune K, McDermott U, McGillivray PD, McLellan MD, McPherson JD, McPherson JR, McPherson TA, Meier SR, Meng A, Meng S, Menzies A, Merrett ND, Merson S, Meyerson M, Meyerson W, Mieczkowski PA, Mihaiescu GL, Mijalkovic S, Mikkelsen T, Milella M, Mileshkin L, Miller CA, Miller DK, Miller JK, Mills GB, Milovanovic A, Minner S, Miotto M, Arnau GM, Mirabello L, Mitchell C, Mitchell TJ, Miyano S, Miyoshi N, Mizuno S, Molnár-Gábor F, Moore MJ, Moore RA, Morganella S, Morris QD, Morrison C, Mose LE, Moser CD, Muiños F, Mularoni L, Mungall AJ, Mungall K, Musgrove EA, Mustonen V, Mutch D, Muyas F, Muzny DM, Muñoz A, Myers J, Myklebost O, Möller P, Nagae G, Nagrial AM, Nahal-Bose HK, Nakagama H, Nakagawa H, Nakamura H, Nakamura T, Nakano K, Nandi T, Nangalia J, Nastic M, Navarro A, Navarro FCP, Neal DE, Nettekoven G, Newell F, Newhouse SJ, Newton Y, Ng AWT, Ng A, Nicholson J, Nicol D, Nie Y, Nielsen GP, Nielsen MM, Nik-Zainal S, Noble MS, Nones K, Northcott PA, Notta F, O’Connor BD, O’Donnell P, O’Donovan M, O’Meara S, O’Neill BP, O’Neill JR, Ocana D, Ochoa A, Oesper L, Ogden C, Ohdan H, Ohi K, Ohno-Machado L, Oien KA, Ojesina AI, Ojima H, Okusaka T, Omberg L, Ong CK, Ossowski S, Ott G, Ouellette BFF, P’ng C, Paczkowska M, Paiella S, Pairojkul C, Pajic M, Pan-Hammarström Q, Papaemmanuil E, Papatheodorou I, Paramasivam N, Park JW, Park JW, Park K, Park K, Park PJ, Parker JS, Parsons SL, Pass H, Pasternack D, Pastore A, Patch AM, Pauporté I, Pea A, Pearson JV. Author Correction: Genomic basis for RNA alterations in cancer. Nature 2023; 614:E37. [PMID: 36697831 PMCID: PMC9931574 DOI: 10.1038/s41586-022-05596-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
| | - Claudia Calabrese
- grid.225360.00000 0000 9709 7726European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Natalie R. Davidson
- grid.5801.c0000 0001 2156 2780ETH Zurich, Zurich, Switzerland ,grid.51462.340000 0001 2171 9952Memorial Sloan Kettering Cancer Center, New York, NY USA ,grid.5386.8000000041936877XWeill Cornell Medical College, New York, NY USA ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,grid.412004.30000 0004 0478 9977University Hospital Zurich, Zurich, Switzerland
| | - Deniz Demircioğlu
- grid.4280.e0000 0001 2180 6431National University of Singapore, Singapore, Singapore ,grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Singapore, Singapore
| | - Nuno A. Fonseca
- grid.225360.00000 0000 9709 7726European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Yao He
- grid.11135.370000 0001 2256 9319Peking University, Beijing, China
| | - André Kahles
- grid.5801.c0000 0001 2156 2780ETH Zurich, Zurich, Switzerland ,grid.51462.340000 0001 2171 9952Memorial Sloan Kettering Cancer Center, New York, NY USA ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,grid.412004.30000 0004 0478 9977University Hospital Zurich, Zurich, Switzerland
| | - Kjong-Van Lehmann
- grid.5801.c0000 0001 2156 2780ETH Zurich, Zurich, Switzerland ,grid.51462.340000 0001 2171 9952Memorial Sloan Kettering Cancer Center, New York, NY USA ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,grid.412004.30000 0004 0478 9977University Hospital Zurich, Zurich, Switzerland
| | - Fenglin Liu
- grid.11135.370000 0001 2256 9319Peking University, Beijing, China
| | - Yuichi Shiraishi
- grid.26999.3d0000 0001 2151 536XThe University of Tokyo, Minato-ku, Japan
| | - Cameron M. Soulette
- grid.205975.c0000 0001 0740 6917University of California, Santa Cruz, Santa Cruz, CA USA
| | - Lara Urban
- grid.225360.00000 0000 9709 7726European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Liliana Greger
- grid.225360.00000 0000 9709 7726European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Siliang Li
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.507779.b0000 0004 4910 5858China National GeneBank-Shenzhen, Shenzhen, China
| | - Dongbing Liu
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.507779.b0000 0004 4910 5858China National GeneBank-Shenzhen, Shenzhen, China
| | - Marc D. Perry
- grid.17063.330000 0001 2157 2938Ontario Institute for Cancer Research, Toronto, Ontario, Canada ,grid.266102.10000 0001 2297 6811University of California, San Francisco, San Francisco, CA USA
| | - Qian Xiang
- grid.17063.330000 0001 2157 2938Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Fan Zhang
- grid.11135.370000 0001 2256 9319Peking University, Beijing, China
| | - Junjun Zhang
- grid.17063.330000 0001 2157 2938Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Peter Bailey
- grid.8756.c0000 0001 2193 314XUniversity of Glasgow, Glasgow, UK
| | - Serap Erkek
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Katherine A. Hoadley
- grid.10698.360000000122483208The University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Yong Hou
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.507779.b0000 0004 4910 5858China National GeneBank-Shenzhen, Shenzhen, China
| | - Matthew R. Huska
- grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany
| | - Helena Kilpinen
- grid.83440.3b0000000121901201University College London, London, UK
| | - Jan O. Korbel
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Maximillian G. Marin
- grid.205975.c0000 0001 0740 6917University of California, Santa Cruz, Santa Cruz, CA USA
| | - Julia Markowski
- grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany
| | - Tannistha Nandi
- grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Singapore, Singapore
| | - Qiang Pan-Hammarström
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.4714.60000 0004 1937 0626Karolinska Institutet, Stockholm, Sweden
| | - Chandra Sekhar Pedamallu
- grid.66859.340000 0004 0546 1623Broad Institute, Cambridge, MA USA ,grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA ,grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
| | - Reiner Siebert
- grid.410712.10000 0004 0473 882XUlm University and Ulm University Medical Center, Ulm, Germany
| | - Stefan G. Stark
- grid.5801.c0000 0001 2156 2780ETH Zurich, Zurich, Switzerland ,grid.51462.340000 0001 2171 9952Memorial Sloan Kettering Cancer Center, New York, NY USA ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,grid.412004.30000 0004 0478 9977University Hospital Zurich, Zurich, Switzerland
| | - Hong Su
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.507779.b0000 0004 4910 5858China National GeneBank-Shenzhen, Shenzhen, China
| | - Patrick Tan
- grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Singapore, Singapore ,grid.428397.30000 0004 0385 0924Duke-NUS Medical School, Singapore, Singapore
| | - Sebastian M. Waszak
- grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Christina Yung
- grid.17063.330000 0001 2157 2938Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Shida Zhu
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.507779.b0000 0004 4910 5858China National GeneBank-Shenzhen, Shenzhen, China
| | - Philip Awadalla
- grid.17063.330000 0001 2157 2938Ontario Institute for Cancer Research, Toronto, Ontario, Canada ,grid.17063.330000 0001 2157 2938University of Toronto, Toronto, Ontario Canada
| | - Chad J. Creighton
- grid.39382.330000 0001 2160 926XBaylor College of Medicine, Houston, TX USA
| | - Matthew Meyerson
- grid.66859.340000 0004 0546 1623Broad Institute, Cambridge, MA USA ,grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA ,grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
| | | | - Kui Wu
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China ,grid.507779.b0000 0004 4910 5858China National GeneBank-Shenzhen, Shenzhen, China
| | - Huanming Yang
- grid.21155.320000 0001 2034 1839BGI-Shenzhen, Shenzhen, China
| | | | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
| | - Angela N. Brooks
- grid.205975.c0000 0001 0740 6917University of California, Santa Cruz, Santa Cruz, CA USA ,grid.66859.340000 0004 0546 1623Broad Institute, Cambridge, MA USA ,grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA
| | - Jonathan Göke
- grid.418377.e0000 0004 0620 715XGenome Institute of Singapore, Singapore, Singapore ,grid.410724.40000 0004 0620 9745National Cancer Centre Singapore, Singapore, Singapore
| | - Gunnar Rätsch
- ETH Zurich, Zurich, Switzerland. .,Memorial Sloan Kettering Cancer Center, New York, NY, USA. .,Weill Cornell Medical College, New York, NY, USA. .,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland. .,University Hospital Zurich, Zurich, Switzerland.
| | - Roland F. Schwarz
- grid.225360.00000 0000 9709 7726European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK ,grid.419491.00000 0001 1014 0849Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany ,grid.7497.d0000 0004 0492 0584German Cancer Consortium (DKTK), partner site Berlin, Germany ,grid.7497.d0000 0004 0492 0584German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Oliver Stegle
- grid.225360.00000 0000 9709 7726European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK ,grid.4709.a0000 0004 0495 846XEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany ,grid.7497.d0000 0004 0492 0584German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Zemin Zhang
- grid.11135.370000 0001 2256 9319Peking University, Beijing, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Rozhoňová H, Danciu D, Stark S, Rätsch G, Kahles A, Lehmann KV. SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing. Bioinformatics 2022; 38:4293-4300. [PMID: 35900151 PMCID: PMC9477524 DOI: 10.1093/bioinformatics/btac510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 07/04/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase segments. Many tumors are not copy number-driven, and thus single-nucleotide variant (SNV)-based subclone detection may contribute to a more comprehensive view on intra-tumor heterogeneity. Due to the low coverage of the data, the identification of SNVs is only possible when superimposing the sequenced genomes of hundreds of genetically similar cells. Thus, we have developed a new approach to efficiently cluster tumor cells based on a Bayesian filtering approach of relevant loci and exploiting read overlap and phasing. RESULTS We developed Single Cell Data Tumor Clusterer (SECEDO, lat. 'to separate'), a new method to cluster tumor cells based solely on SNVs, inferred on ultra-low coverage single-cell DNA sequencing data. We applied SECEDO to a synthetic dataset simulating 7250 cells and eight tumor subclones from a single patient and were able to accurately reconstruct the clonal composition, detecting 92.11% of the somatic SNVs, with the smallest clusters representing only 6.9% of the total population. When applied to five real single-cell sequencing datasets from a breast cancer patient, each consisting of ≈2000 cells, SECEDO was able to recover the major clonal composition in each dataset at the original coverage of 0.03×, achieving an Adjusted Rand Index (ARI) score of ≈0.6. The current state-of-the-art SNV-based clustering method achieved an ARI score of ≈0, even after merging cells to create higher coverage data (factor 10 increase), and was only able to match SECEDOs performance when pooling data from all five datasets, in addition to artificially increasing the sequencing coverage by a factor of 7. Variant calling on the resulting clusters recovered more than twice as many SNVs as would have been detected if calling on all cells together. Further, the allelic ratio of the called SNVs on each subcluster was more than double relative to the allelic ratio of the SNVs called without clustering, thus demonstrating that calling variants on subclones, in addition to both increasing sensitivity of SNV detection and attaching SNVs to subclones, significantly increases the confidence of the called variants. AVAILABILITY AND IMPLEMENTATION SECEDO is implemented in C++ and is publicly available at https://github.com/ratschlab/secedo. Instructions to download the data and the evaluation code to reproduce the findings in this paper are available at: https://github.com/ratschlab/secedo-evaluation. The code and data of the submitted version are archived at: https://doi.org/10.5281/zenodo.6516955. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Stefan Stark
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland
| | | | | | | |
Collapse
|
6
|
Abstract
Sequencing data are rapidly accumulating in public repositories. Making this resource accessible for interactive analysis at scale requires efficient approaches for its storage and indexing. There have recently been remarkable advances in building compressed representations of annotated (or colored) de Bruijn graphs for efficiently indexing k-mer sets. However, approaches for representing quantitative attributes such as gene expression or genome positions in a general manner have remained underexplored. In this work, we propose counting de Bruijn graphs, a notion generalizing annotated de Bruijn graphs by supplementing each node-label relation with one or many attributes (e.g., a k-mer count or its positions). Counting de Bruijn graphs index k-mer abundances from 2652 human RNA-seq samples in over eightfold smaller representations compared with state-of-the-art bioinformatics tools and is faster to construct and query. Furthermore, counting de Bruijn graphs with positional annotations losslessly represent entire reads in indexes on average 27% smaller than the input compressed with gzip for human Illumina RNA-seq and 57% smaller for Pacific Biosciences (PacBio) HiFi sequencing of viral samples. A complete searchable index of all viral PacBio SMRT reads from NCBI's Sequence Read Archive (SRA) (152,884 samples, 875 Gbp) comprises only 178 GB. Finally, on the full RefSeq collection, we generate a lossless and fully queryable index that is 4.6-fold smaller than the MegaBLAST index. The techniques proposed in this work naturally complement existing methods and tools using de Bruijn graphs, and significantly broaden their applicability: from indexing k-mer counts and genome positions to implementing novel sequence alignment algorithms on top of highly compressed graph-based sequence indexes.
Collapse
Affiliation(s)
- Mikhail Karasikov
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Harun Mustafa
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Biology at ETH Zurich, 8093 Zurich, Switzerland
- ETH AI Center, ETH Zurich, 8092 Zurich, Switzerland
| | - André Kahles
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
7
|
Markolin P, Rätsch G, Kahles A. Identification, Quantification, and Testing of Alternative Splicing Events from RNA-Seq Data Using SplAdder. Methods Mol Biol 2022; 2493:167-193. [PMID: 35751815 DOI: 10.1007/978-1-0716-2293-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Alternative splicing (AS) is a regulatory process during mRNA maturation that shapes higher eukaryotes' complex transcriptomes. High-throughput sequencing of RNA (RNA-Seq) allows for measurements of AS transcripts at an unprecedented depth and diversity. The ever-expanding catalog of known AS events provides biological insights into gene regulation, population genetics, or in the context of disease. Here, we present an overview on the usage of SplAdder, a graph-based alternative splicing toolbox, which can integrate an arbitrarily large number of RNA-Seq alignments and a given annotation file to augment the shared annotation based on RNA-Seq evidence. The shared augmented annotation graph is then used to identify, quantify, and confirm alternative splicing events based on the RNA-Seq data. Splice graphs for individual alignments can also be tested for significant quantitative differences between other samples or groups of samples.
Collapse
Affiliation(s)
- Philipp Markolin
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Gunnar Rätsch
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - André Kahles
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland.
- Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
8
|
Abstract
Motivation Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. Availability and implementation RowDiff is implemented in C++ within the MetaGraph framework. The source code and the data used in the experiments are publicly available at https://github.com/ratschlab/row_diff.
Collapse
Affiliation(s)
- Daniel Danciu
- Department of Computer Science, Biomedical Informatics Group, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland
| | - Mikhail Karasikov
- Department of Computer Science, Biomedical Informatics Group, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Harun Mustafa
- Department of Computer Science, Biomedical Informatics Group, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - André Kahles
- Department of Computer Science, Biomedical Informatics Group, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Gunnar Rätsch
- Department of Computer Science, Biomedical Informatics Group, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Zurich, Switzerland.,Department of Biology, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
9
|
Demircioğlu D, Cukuroglu E, Kindermans M, Nandi T, Calabrese C, Fonseca NA, Kahles A, Lehmann KV, Stegle O, Brazma A, Brooks AN, Rätsch G, Tan P, Göke J. A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters. Cell 2020; 178:1465-1477.e17. [PMID: 31491388 DOI: 10.1016/j.cell.2019.08.018] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 12/13/2018] [Accepted: 08/07/2019] [Indexed: 02/08/2023]
Abstract
Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.
Collapse
Affiliation(s)
- Deniz Demircioğlu
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore; School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Engin Cukuroglu
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Martin Kindermans
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Tannistha Nandi
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Claudia Calabrese
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; Genome Biology Unit, EMBL, Heidelberg, 69117, Germany
| | - Nuno A Fonseca
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; CIBIO/InBIO - Research Center in Biodiversity and Genetic Resources, Universidade do Porto, Vairão 4485-601, Portugal
| | - André Kahles
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland; Department of Biology, ETH Zurich, Zurich 8093, Switzerland; Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Biomedical Informatics Research, University Hospital Zurich, Zurich 8091, Switzerland
| | - Kjong-Van Lehmann
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland; Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Biomedical Informatics Research, University Hospital Zurich, Zurich 8091, Switzerland
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; Genome Biology Unit, EMBL, Heidelberg, 69117, Germany; Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Angela N Brooks
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich 8092, Switzerland; Department of Biology, ETH Zurich, Zurich 8093, Switzerland; Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland; Biomedical Informatics Research, University Hospital Zurich, Zurich 8091, Switzerland; Weill Cornell Medical College, New York, NY 10065, USA
| | - Patrick Tan
- Program in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore 169857, Singapore; Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore; Cancer Therapeutics and Stratified Oncology, Genome Institute of Singapore, Singapore 138672, Singapore; SingHealth/Duke-NUS Institute of Precision Medicine, National Heart Centre Singapore, Singapore 169856, Singapore; Cellular and Molecular Research, National Cancer Centre, Singapore 169610, Singapore; Singapore Gastric Cancer Consortium, Singapore 119074, Singapore
| | - Jonathan Göke
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore; Cellular and Molecular Research, National Cancer Centre, Singapore 169610, Singapore.
| |
Collapse
|
10
|
Abstract
High-throughput DNA sequencing data are accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and to allow for efficient querying of sequences. In particular, the concept of labeled de Bruijn graphs has been explored by several groups. Although there has been good progress toward representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the graph labeling. In this study, we present a new compression approach, Multi-binary relation wavelet tree (BRWT), which is adaptive to different kinds of input data. We show an up to 29% improvement in compression performance over the basic BRWT method, and up to a 68% improvement over the current state-of-the-art for de Bruijn graph label compression. To put our results into perspective, we present a systematic analysis of five different state-of-the-art annotation compression schemes, evaluate key metrics on both artificial and real-world data, and discuss how different data characteristics influence the compression performance. We show that the improvements of our new method can be robustly reproduced for different representative real-world data sets.
Collapse
Affiliation(s)
- Mikhail Karasikov
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Hospital Zurich, Zurich, Switzerland
| | - Harun Mustafa
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Hospital Zurich, Zurich, Switzerland
| | - Amir Joudaki
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Hospital Zurich, Zurich, Switzerland
| | | | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Hospital Zurich, Zurich, Switzerland
| | - André Kahles
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University Hospital Zurich, Zurich, Switzerland
| |
Collapse
|
11
|
Calabrese C, Davidson NR, Demircioğlu D, Fonseca NA, He Y, Kahles A, Lehmann KV, Liu F, Shiraishi Y, Soulette CM, Urban L, Greger L, Li S, Liu D, Perry MD, Xiang Q, Zhang F, Zhang J, Bailey P, Erkek S, Hoadley KA, Hou Y, Huska MR, Kilpinen H, Korbel JO, Marin MG, Markowski J, Nandi T, Pan-Hammarström Q, Pedamallu CS, Siebert R, Stark SG, Su H, Tan P, Waszak SM, Yung C, Zhu S, Awadalla P, Creighton CJ, Meyerson M, Ouellette BFF, Wu K, Yang H, Brazma A, Brooks AN, Göke J, Rätsch G, Schwarz RF, Stegle O, Zhang Z. Genomic basis for RNA alterations in cancer. Nature 2020; 578:129-136. [PMID: 32025019 PMCID: PMC7054216 DOI: 10.1038/s41586-020-1970-0] [Citation(s) in RCA: 226] [Impact Index Per Article: 56.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 12/11/2019] [Indexed: 01/27/2023]
Abstract
Transcript alterations often result from somatic changes in cancer genomes1. Various forms of RNA alterations have been described in cancer, including overexpression2, altered splicing3 and gene fusions4; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)5. Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed 'bridged' fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.
Collapse
Affiliation(s)
| | - Claudia Calabrese
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Natalie R. Davidson
- 0000 0001 2156 2780grid.5801.cETH Zurich, Zurich, Switzerland ,0000 0001 2171 9952grid.51462.34Memorial Sloan Kettering Cancer Center, New York, NY USA ,000000041936877Xgrid.5386.8Weill Cornell Medical College, New York, NY USA ,0000 0001 2223 3006grid.419765.8SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,0000 0004 0478 9977grid.412004.3University Hospital Zurich, Zurich, Switzerland
| | - Deniz Demircioğlu
- 0000 0001 2180 6431grid.4280.eNational University of Singapore, Singapore, Singapore ,0000 0004 0620 715Xgrid.418377.eGenome Institute of Singapore, Singapore, Singapore
| | - Nuno A. Fonseca
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Yao He
- 0000 0001 2256 9319grid.11135.37Peking University, Beijing, China
| | - André Kahles
- 0000 0001 2156 2780grid.5801.cETH Zurich, Zurich, Switzerland ,0000 0001 2171 9952grid.51462.34Memorial Sloan Kettering Cancer Center, New York, NY USA ,0000 0001 2223 3006grid.419765.8SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,0000 0004 0478 9977grid.412004.3University Hospital Zurich, Zurich, Switzerland
| | - Kjong-Van Lehmann
- 0000 0001 2156 2780grid.5801.cETH Zurich, Zurich, Switzerland ,0000 0001 2171 9952grid.51462.34Memorial Sloan Kettering Cancer Center, New York, NY USA ,0000 0001 2223 3006grid.419765.8SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,0000 0004 0478 9977grid.412004.3University Hospital Zurich, Zurich, Switzerland
| | - Fenglin Liu
- 0000 0001 2256 9319grid.11135.37Peking University, Beijing, China
| | - Yuichi Shiraishi
- 0000 0001 2151 536Xgrid.26999.3dThe University of Tokyo, Minato-ku, Japan
| | - Cameron M. Soulette
- 0000 0001 0740 6917grid.205975.cUniversity of California, Santa Cruz, Santa Cruz, CA USA
| | - Lara Urban
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Liliana Greger
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Siliang Li
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,China National GeneBank-Shenzhen, Shenzhen, China
| | - Dongbing Liu
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,China National GeneBank-Shenzhen, Shenzhen, China
| | - Marc D. Perry
- 0000 0004 0626 690Xgrid.419890.dOntario Institute for Cancer Research, Toronto, Ontario, Canada ,0000 0001 2297 6811grid.266102.1University of California, San Francisco, San Francisco, CA USA
| | - Qian Xiang
- 0000 0004 0626 690Xgrid.419890.dOntario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Fan Zhang
- 0000 0001 2256 9319grid.11135.37Peking University, Beijing, China
| | - Junjun Zhang
- 0000 0004 0626 690Xgrid.419890.dOntario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Peter Bailey
- 0000 0001 2193 314Xgrid.8756.cUniversity of Glasgow, Glasgow, UK
| | - Serap Erkek
- 0000 0004 0495 846Xgrid.4709.aEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Katherine A. Hoadley
- 0000000122483208grid.10698.36The University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Yong Hou
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,China National GeneBank-Shenzhen, Shenzhen, China
| | - Matthew R. Huska
- 0000 0001 1014 0849grid.419491.0Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany
| | - Helena Kilpinen
- 0000000121901201grid.83440.3bUniversity College London, London, UK
| | - Jan O. Korbel
- 0000 0004 0495 846Xgrid.4709.aEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Maximillian G. Marin
- 0000 0001 0740 6917grid.205975.cUniversity of California, Santa Cruz, Santa Cruz, CA USA
| | - Julia Markowski
- 0000 0001 1014 0849grid.419491.0Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany
| | - Tannistha Nandi
- 0000 0004 0620 715Xgrid.418377.eGenome Institute of Singapore, Singapore, Singapore
| | - Qiang Pan-Hammarström
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,0000 0004 1937 0626grid.4714.6Karolinska Institutet, Stockholm, Sweden
| | - Chandra Sekhar Pedamallu
- grid.66859.34Broad Institute, Cambridge, MA USA ,0000 0001 2106 9910grid.65499.37Dana-Farber Cancer Institute, Boston, MA USA ,000000041936754Xgrid.38142.3cHarvard Medical School, Boston, MA USA
| | - Reiner Siebert
- grid.410712.1Ulm University and Ulm University Medical Center, Ulm, Germany
| | - Stefan G. Stark
- 0000 0001 2156 2780grid.5801.cETH Zurich, Zurich, Switzerland ,0000 0001 2171 9952grid.51462.34Memorial Sloan Kettering Cancer Center, New York, NY USA ,0000 0001 2223 3006grid.419765.8SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,0000 0004 0478 9977grid.412004.3University Hospital Zurich, Zurich, Switzerland
| | - Hong Su
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,China National GeneBank-Shenzhen, Shenzhen, China
| | - Patrick Tan
- 0000 0004 0620 715Xgrid.418377.eGenome Institute of Singapore, Singapore, Singapore ,0000 0004 0385 0924grid.428397.3Duke-NUS Medical School, Singapore, Singapore
| | - Sebastian M. Waszak
- 0000 0004 0495 846Xgrid.4709.aEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Christina Yung
- 0000 0004 0626 690Xgrid.419890.dOntario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Shida Zhu
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,China National GeneBank-Shenzhen, Shenzhen, China
| | - Philip Awadalla
- 0000 0004 0626 690Xgrid.419890.dOntario Institute for Cancer Research, Toronto, Ontario, Canada ,0000 0001 2157 2938grid.17063.33University of Toronto, Toronto, Ontario Canada
| | - Chad J. Creighton
- 0000 0001 2160 926Xgrid.39382.33Baylor College of Medicine, Houston, TX USA
| | - Matthew Meyerson
- grid.66859.34Broad Institute, Cambridge, MA USA ,0000 0001 2106 9910grid.65499.37Dana-Farber Cancer Institute, Boston, MA USA ,000000041936754Xgrid.38142.3cHarvard Medical School, Boston, MA USA
| | | | - Kui Wu
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China ,China National GeneBank-Shenzhen, Shenzhen, China
| | - Huanming Yang
- 0000 0001 2034 1839grid.21155.32BGI-Shenzhen, Shenzhen, China
| | | | - Alvis Brazma
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Angela N. Brooks
- 0000 0001 0740 6917grid.205975.cUniversity of California, Santa Cruz, Santa Cruz, CA USA ,grid.66859.34Broad Institute, Cambridge, MA USA ,0000 0001 2106 9910grid.65499.37Dana-Farber Cancer Institute, Boston, MA USA
| | - Jonathan Göke
- 0000 0004 0620 715Xgrid.418377.eGenome Institute of Singapore, Singapore, Singapore ,0000 0004 0620 9745grid.410724.4National Cancer Centre Singapore, Singapore, Singapore
| | - Gunnar Rätsch
- 0000 0001 2156 2780grid.5801.cETH Zurich, Zurich, Switzerland ,0000 0001 2171 9952grid.51462.34Memorial Sloan Kettering Cancer Center, New York, NY USA ,000000041936877Xgrid.5386.8Weill Cornell Medical College, New York, NY USA ,0000 0001 2223 3006grid.419765.8SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland ,0000 0004 0478 9977grid.412004.3University Hospital Zurich, Zurich, Switzerland
| | - Roland F. Schwarz
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK ,0000 0001 1014 0849grid.419491.0Berlin Institute for Medical Systems Biology, Max Delbruck Center for Molecular Medicine, Berlin, Germany ,0000 0004 0492 0584grid.7497.dGerman Cancer Consortium (DKTK), partner site Berlin, Germany ,0000 0004 0492 0584grid.7497.dGerman Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Oliver Stegle
- 0000 0000 9709 7726grid.225360.0European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK ,0000 0004 0495 846Xgrid.4709.aEuropean Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany ,0000 0004 0492 0584grid.7497.dGerman Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Zemin Zhang
- 0000 0001 2256 9319grid.11135.37Peking University, Beijing, China
| | | |
Collapse
|
12
|
Mustafa H, Schilken I, Karasikov M, Eickhoff C, Rätsch G, Kahles A. Dynamic compression schemes for graph coloring. Bioinformatics 2019; 35:407-414. [PMID: 30020403 PMCID: PMC6530811 DOI: 10.1093/bioinformatics/bty632] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 07/16/2018] [Indexed: 11/30/2022] Open
Abstract
Motivation Technological advancements in high-throughput DNA sequencing have led to an exponential growth of sequencing data being produced and stored as a byproduct of biomedical research. Despite its public availability, a majority of this data remains hard to query for the research community due to a lack of efficient data representation and indexing solutions. One of the available techniques to represent read data is a condensed form as an assembly graph. Such a representation contains all sequence information but does not store contextual information and metadata. Results We present two new approaches for a compressed representation of a graph coloring: a lossless compression scheme based on a novel application of wavelet tries as well as a highly accurate lossy compression based on a set of Bloom filters. Both strategies retain a coloring even when adding to the underlying graph topology. We present construction and merge procedures for both methods and evaluate their performance on a wide range of different datasets. By dropping the requirement of a fully lossless compression and using the topological information of the underlying graph, we can reduce memory requirements by up to three orders of magnitude. Representing individual colors as independently stored modules, our approaches can be efficiently parallelized and provide strategies for dynamic use. These properties allow for an easy upscaling to the problem sizes common to the biomedical domain. Availability and implementation We provide prototype implementations in C++, summaries of our experiments as well as links to all datasets publicly at https://github.com/ratschlab/graph_annotation. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Harun Mustafa
- Department of Computer Science, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Ingo Schilken
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Mikhail Karasikov
- Department of Computer Science, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Carsten Eickhoff
- Brown Center for Biomedical Informatics, Brown University, Providence, RI, USA
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - André Kahles
- Department of Computer Science, ETH Zurich, Zurich, Switzerland.,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
13
|
Kahles A, Lehmann KV, Toussaint NC, Hüser M, Stark SG, Sachsenberg T, Stegle O, Kohlbacher O, Sander C, Rätsch G. Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients. Cancer Cell 2018; 34:211-224.e6. [PMID: 30078747 PMCID: PMC9844097 DOI: 10.1016/j.ccell.2018.07.001] [Citation(s) in RCA: 483] [Impact Index Per Article: 80.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 03/30/2018] [Accepted: 07/02/2018] [Indexed: 01/19/2023]
Abstract
Our comprehensive analysis of alternative splicing across 32 The Cancer Genome Atlas cancer types from 8,705 patients detects alternative splicing events and tumor variants by reanalyzing RNA and whole-exome sequencing data. Tumors have up to 30% more alternative splicing events than normal samples. Association analysis of somatic variants with alternative splicing events confirmed known trans associations with variants in SF3B1 and U2AF1 and identified additional trans-acting variants (e.g., TADA1, PPP2R1A). Many tumors have thousands of alternative splicing events not detectable in normal samples; on average, we identified ≈930 exon-exon junctions ("neojunctions") in tumors not typically found in GTEx normals. From Clinical Proteomic Tumor Analysis Consortium data available for breast and ovarian tumor samples, we confirmed ≈1.7 neojunction- and ≈0.6 single nucleotide variant-derived peptides per tumor sample that are also predicted major histocompatibility complex-I binders ("putative neoantigens").
Collapse
Affiliation(s)
- André Kahles
- ETH Zurich, Department of Computer Science, Zurich, Switzerland; Memorial Sloan Kettering Cancer Center, Computational Biology Department, New York, USA; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Kjong-Van Lehmann
- ETH Zurich, Department of Computer Science, Zurich, Switzerland; Memorial Sloan Kettering Cancer Center, Computational Biology Department, New York, USA; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Nora C Toussaint
- ETH Zurich, NEXUS Personalized Health Technologies, Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Matthias Hüser
- ETH Zurich, Department of Computer Science, Zurich, Switzerland; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Stefan G Stark
- ETH Zurich, Department of Computer Science, Zurich, Switzerland; Memorial Sloan Kettering Cancer Center, Computational Biology Department, New York, USA; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Timo Sachsenberg
- University of Tübingen, Department of Computer Science, Tübingen, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Oliver Kohlbacher
- University of Tübingen, Department of Computer Science, Tübingen, Germany; Center for Bioinformatics, University of Tübingen, Tübingen, Germany; Quantitative Biology Center, University of Tübingen, Tübingen, Germany; Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany; Institute for Translational Bioinformatics, University Medical Center, Tübingen, Germany
| | - Chris Sander
- Dana-Farber Cancer Institute, cBio Center, Department of Biostatistics and Computational Biology, Boston, MA, USA; Harvard Medical School, CompBio Collaboratory, Department of Cell Biology, Boston, USA
| | | | - Gunnar Rätsch
- ETH Zurich, Department of Computer Science, Zurich, Switzerland; Memorial Sloan Kettering Cancer Center, Computational Biology Department, New York, USA; University Hospital Zurich, Biomedical Informatics Research, Zurich, Switzerland; ETH Zurich, Department of Biology, Zurich, Switzerland; SIB Swiss Institute of Bioinformatics, Zurich, Switzerland.
| |
Collapse
|
14
|
Hartmann L, Drewe-Boß P, Wießner T, Wagner G, Geue S, Lee HC, Obermüller DM, Kahles A, Behr J, Sinz FH, Rätsch G, Wachter A. Alternative Splicing Substantially Diversifies the Transcriptome during Early Photomorphogenesis and Correlates with the Energy Availability in Arabidopsis. Plant Cell 2016; 28:2715-2734. [PMID: 27803310 PMCID: PMC5155347 DOI: 10.1105/tpc.16.00508] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Revised: 10/07/2016] [Accepted: 10/31/2016] [Indexed: 05/18/2023]
Abstract
Plants use light as source of energy and information to detect diurnal rhythms and seasonal changes. Sensing changing light conditions is critical to adjust plant metabolism and to initiate developmental transitions. Here, we analyzed transcriptome-wide alterations in gene expression and alternative splicing (AS) of etiolated seedlings undergoing photomorphogenesis upon exposure to blue, red, or white light. Our analysis revealed massive transcriptome reprogramming as reflected by differential expression of ∼20% of all genes and changes in several hundred AS events. For more than 60% of all regulated AS events, light promoted the production of a presumably protein-coding variant at the expense of an mRNA with nonsense-mediated decay-triggering features. Accordingly, AS of the putative splicing factor REDUCED RED-LIGHT RESPONSES IN CRY1CRY2 BACKGROUND1, previously identified as a red light signaling component, was shifted to the functional variant under light. Downstream analyses of candidate AS events pointed at a role of photoreceptor signaling only in monochromatic but not in white light. Furthermore, we demonstrated similar AS changes upon light exposure and exogenous sugar supply, with a critical involvement of kinase signaling. We propose that AS is an integration point of signaling pathways that sense and transmit information regarding the energy availability in plants.
Collapse
Affiliation(s)
- Lisa Hartmann
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Philipp Drewe-Boß
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York 10065
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13092 Berlin, Germany
| | - Theresa Wießner
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Gabriele Wagner
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Sascha Geue
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Hsin-Chieh Lee
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Dominik M Obermüller
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - André Kahles
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Jonas Behr
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Fabian H Sinz
- Institute for Neurobiology, University of Tübingen, 72076 Tübingen, Germany
| | - Gunnar Rätsch
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York 10065
- Department of Computer Science, ETH Zürich, 8006 Zürich, Switzerland
| | - Andreas Wachter
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
15
|
Kahles A, Ong CS, Zhong Y, Rätsch G. SplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data. Bioinformatics 2016; 32:1840-7. [PMID: 26873928 PMCID: PMC4908322 DOI: 10.1093/bioinformatics/btw076] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 12/18/2015] [Accepted: 02/04/2016] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Understanding the occurrence and regulation of alternative splicing (AS) is a key task towards explaining the regulatory processes that shape the complex transcriptomes of higher eukaryotes. With the advent of high-throughput sequencing of RNA (RNA-Seq), the diversity of AS transcripts could be measured at an unprecedented depth. Although the catalog of known AS events has grown ever since, novel transcripts are commonly observed when working with less well annotated organisms, in the context of disease, or within large populations. Whereas an identification of complete transcripts is technically challenging and computationally expensive, focusing on single splicing events as a proxy for transcriptome characteristics is fruitful and sufficient for a wide range of analyses. RESULTS We present SplAdder, an alternative splicing toolbox, that takes RNA-Seq alignments and an annotation file as input to (i) augment the annotation based on RNA-Seq evidence, (ii) identify alternative splicing events present in the augmented annotation graph, (iii) quantify and confirm these events based on the RNA-Seq data and (iv) test for significant quantitative differences between samples. Thereby, our main focus lies on performance, accuracy and usability. AVAILABILITY Source code and documentation are available for download at http://github.com/ratschlab/spladder Example data, introductory information and a small tutorial are accessible via http://bioweb.me/spladder CONTACTS : andre.kahles@ratschlab.org or gunnar.ratsch@ratschlab.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- André Kahles
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA and
| | - Cheng Soon Ong
- Canberra Research Laboratory, NICTA, Canberra, ACT 2601, Australia
| | - Yi Zhong
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA and
| | - Gunnar Rätsch
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA and
| |
Collapse
|
16
|
Abstract
Motivation: Mapping high-throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase but also has significant impact on the results of downstream analyses. We present the multi-mapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads. Results: Filtering alignments with MMR can significantly improve the performance of downstream analyses like transcript quantitation and differential testing. We illustrate that the accuracy (Spearman correlation) of transcript quantification increases by 15% when using reads of length 51. In addition, MMR decreases the alignment file sizes by more than 50%, and this leads to a reduced running time of the quantification tool. Our efficient implementation of the MMR algorithm is easily applicable as a post-processing step to existing alignment files in BAM format. Its complexity scales linearly with the number of alignments and requires no further inputs. Availability and implementation: Open source code and documentation are available for download at http://github.com/ratschlab/mmr. Comprehensive testing results and further information can be found at http://bioweb.me/mmr. Contact:andre.kahles@ratschlab.org or gunnar.ratsch@ratschlab.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- André Kahles
- Memorial Sloan Kettering Cancer Center, Computational Biology Center, 1275 York Avenue, New York, NY 10065, USA
| | - Jonas Behr
- Memorial Sloan Kettering Cancer Center, Computational Biology Center, 1275 York Avenue, New York, NY 10065, USA
| | - Gunnar Rätsch
- Memorial Sloan Kettering Cancer Center, Computational Biology Center, 1275 York Avenue, New York, NY 10065, USA
| |
Collapse
|
17
|
Dubin MJ, Zhang P, Meng D, Remigereau MS, Osborne EJ, Paolo Casale F, Drewe P, Kahles A, Jean G, Vilhjálmsson B, Jagoda J, Irez S, Voronin V, Song Q, Long Q, Rätsch G, Stegle O, Clark RM, Nordborg M. DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. eLife 2015; 4:e05255. [PMID: 25939354 DOI: 10.7554/elife.05255.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 03/26/2015] [Indexed: 05/20/2023] Open
Abstract
Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation.
Collapse
Affiliation(s)
- Manu J Dubin
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Pei Zhang
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Dazhe Meng
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | | | - Edward J Osborne
- Department of Biology, University of Utah, Salt Lake City, United States
| | - Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Philipp Drewe
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - André Kahles
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Geraldine Jean
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Bjarni Vilhjálmsson
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Joanna Jagoda
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Selen Irez
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Viktor Voronin
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Qiang Song
- Molecular and Computational Biology, University of Southern California, Los Angeles, United States
| | - Quan Long
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Gunnar Rätsch
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Richard M Clark
- Department of Biology, University of Utah, Salt Lake City, United States
| | - Magnus Nordborg
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| |
Collapse
|
18
|
Dubin MJ, Zhang P, Meng D, Remigereau MS, Osborne EJ, Paolo Casale F, Drewe P, Kahles A, Jean G, Vilhjálmsson B, Jagoda J, Irez S, Voronin V, Song Q, Long Q, Rätsch G, Stegle O, Clark RM, Nordborg M. DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. eLife 2015; 4:e05255. [PMID: 25939354 PMCID: PMC4413256 DOI: 10.7554/elife.05255] [Citation(s) in RCA: 313] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 03/26/2015] [Indexed: 01/21/2023] Open
Abstract
Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation. DOI:http://dx.doi.org/10.7554/eLife.05255.001 Organisms need to adapt quickly to changes in their environment. Mutations in the DNA sequence of genes can lead to new adaptations, but this can take many generations. Instead, altering how genes are switched on by changing how the DNA is packaged in cells can allow organisms to adapt within and between generations. One way that genes are controlled in organisms is by a process known as DNA methylation, where ‘methyl’ tags are added to DNA and act as markers for other proteins involved in activating genes. DNA is made of four different molecules called ‘nucleotides’ that are arranged in different orders to produce a vast variety of DNA sequences. One type of DNA methylation can happen at sites where a nucleotide called cytosine is followed by two other non-cytosine nucleotides. Another type of methylation can take place at sites where a cytosine is followed by a guanine nucleotide. However, it is not clear how big a role DNA methylation plays in allowing organisms to adapt to their changing environment. Here, Dubin, Zhang, Meng, Remigereau et al. studied DNA methylation in a plant called Arabidopsis thaliana. Several different varieties of A. thaliana plants from Sweden were grown at two different temperatures. The experiments showed that the A. thaliana plants grown at higher temperatures were more likely to have methyl tags attached to sections of DNA called transposons, which are able to move around the genome. There was a lot of variety in the levels of this DNA methylation in the different plants, and some of it was shown to be associated with variation in a gene that is involved in DNA methylation. However, not all of the DNA methylation in these plants was sensitive to the temperature the plants were grown in. Dubin, Zhang, Meng, Remigereau et al. show that the pattern of a type of DNA methylation that is found within genes depends on how far north in Sweden the plants' ancestors came from rather than the temperature the plants were grown in. Plants that originated from colder regions, farther north, had more DNA methylation within many genes and these genes were more active. These findings suggest that genetic differences in these plants strongly influence the levels of DNA methylation, and they provide the first direct link between DNA methylation and adaption to the environment. Future studies should reveal how DNA methylation is regulated in these plants, and whether it plays a key role in adaptation, or merely reflects other changes in the genome. DOI:http://dx.doi.org/10.7554/eLife.05255.002
Collapse
Affiliation(s)
- Manu J Dubin
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Pei Zhang
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Dazhe Meng
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | | | - Edward J Osborne
- Department of Biology, University of Utah, Salt Lake City, United States
| | - Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Philipp Drewe
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - André Kahles
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Geraldine Jean
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Bjarni Vilhjálmsson
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Joanna Jagoda
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Selen Irez
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Viktor Voronin
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Qiang Song
- Molecular and Computational Biology, University of Southern California, Los Angeles, United States
| | - Quan Long
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Gunnar Rätsch
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Richard M Clark
- Department of Biology, University of Utah, Salt Lake City, United States
| | - Magnus Nordborg
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| |
Collapse
|
19
|
Dubin MJ, Zhang P, Meng D, Remigereau MS, Osborne EJ, Paolo Casale F, Drewe P, Kahles A, Jean G, Vilhjálmsson B, Jagoda J, Irez S, Voronin V, Song Q, Long Q, Rätsch G, Stegle O, Clark RM, Nordborg M. DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. eLife 2015; 4:e05255. [PMID: 25939354 DOI: 10.7554/elife.05255.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 03/26/2015] [Indexed: 05/23/2023] Open
Abstract
Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation.
Collapse
Affiliation(s)
- Manu J Dubin
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Pei Zhang
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Dazhe Meng
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | | | - Edward J Osborne
- Department of Biology, University of Utah, Salt Lake City, United States
| | - Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Philipp Drewe
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - André Kahles
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Geraldine Jean
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Bjarni Vilhjálmsson
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Joanna Jagoda
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Selen Irez
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Viktor Voronin
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Qiang Song
- Molecular and Computational Biology, University of Southern California, Los Angeles, United States
| | - Quan Long
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Gunnar Rätsch
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Richard M Clark
- Department of Biology, University of Utah, Salt Lake City, United States
| | - Magnus Nordborg
- Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| |
Collapse
|
20
|
Lehmann KV, Kahles A, Kandoth C, Lee W, Schultz N, Stegle O, Rätsch G. Integrative genome-wide analysis of the determinants of RNA splicing in kidney renal clear cell carcinoma. Pac Symp Biocomput 2015; 20:44-55. [PMID: 25592567 PMCID: PMC4333684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We present a genome-wide analysis of splicing patterns of 282 kidney renal clear cell carcinoma patients in which we integrate data from whole-exome sequencing of tumor and normal samples, RNA-seq and copy number variation. We proposed a scoring mechanism to compare splicing patterns in tumor samples to normal samples in order to rank and detect tumor-specific isoforms that have a potential for new biomarkers. We identified a subset of genes that show introns only observable in tumor but not in normal samples, ENCODE and GEUVADIS samples. In order to improve our understanding of the underlying genetic mechanisms of splicing variation we performed a large-scale association analysis to find links between somatic or germline variants with alternative splicing events. We identified 915 cis- and trans-splicing quantitative trait loci (sQTL) associated with changes in splicing patterns. Some of these sQTL have previously been associated with being susceptibility loci for cancer and other diseases. Our analysis also allowed us to identify the function of several COSMIC variants showing significant association with changes in alternative splicing. This demonstrates the potential significance of variants affecting alternative splicing events and yields insights into the mechanisms related to an array of disease phenotypes.
Collapse
Affiliation(s)
- Kjong-Van Lehmann
- Computational Biology Center, Memorial Kettering Cancer Center, New York, NY 10044, U.S.A
| | | | | | | | | | | | | |
Collapse
|
21
|
Sreedharan VT, Schultheiss SJ, Jean G, Kahles A, Bohnert R, Drewe P, Mudrakarta P, Görnitz N, Zeller G, Rätsch G. Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics 2014; 30:1300-1. [PMID: 24413671 PMCID: PMC3998122 DOI: 10.1093/bioinformatics/btt731] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 11/09/2013] [Accepted: 12/13/2013] [Indexed: 11/17/2022] Open
Abstract
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Collapse
Affiliation(s)
- Vipin T Sreedharan
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA, Machine Learning in Biology Group, Friedrich Miescher Laboratory, Tübingen, Germany, LINA, Combinatorics and Bioinformatics Group, University of Nantes, Nantes, France, Machine Learning/Intelligent Data Analysis Group, Technical University, Berlin, Germany and Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Sreedharan VT, Schultheiss SJ, Jean G, Kahles A, Bohnert R, Drewe P, Mudrakarta P, Görnitz N, Zeller G, Rätsch G. Oqtans: a multifunctional workbench for RNA-seq data analysis. BMC Bioinformatics 2014. [PMCID: PMC4072424 DOI: 10.1186/1471-2105-15-s3-a7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
23
|
Kahles A, Sarqume F, Savolainen P, Arvestad L. Excap: maximization of haplotypic diversity of linked markers. PLoS One 2013; 8:e79012. [PMID: 24244403 PMCID: PMC3820696 DOI: 10.1371/journal.pone.0079012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2012] [Accepted: 09/18/2013] [Indexed: 11/18/2022] Open
Abstract
Genetic markers, defined as variable regions of DNA, can be utilized for distinguishing individuals or populations. As long as markers are independent, it is easy to combine the information they provide. For nonrecombinant sequences like mtDNA, choosing the right set of markers for forensic applications can be difficult and requires careful consideration. In particular, one wants to maximize the utility of the markers. Until now, this has mainly been done by hand. We propose an algorithm that finds the most informative subset of a set of markers. The algorithm uses a depth first search combined with a branch-and-bound approach. Since the worst case complexity is exponential, we also propose some data-reduction techniques and a heuristic. We implemented the algorithm and applied it to two forensic caseworks using mitochondrial DNA, which resulted in marker sets with significantly improved haplotypic diversity compared to previous suggestions. Additionally, we evaluated the quality of the estimation with an artificial dataset of mtDNA. The heuristic is shown to provide extensive speedup at little cost in accuracy.
Collapse
Affiliation(s)
- André Kahles
- KTH Royal Institute of Technology, Stockholm Bioinformatics Center, School of Computer Science and Communication, Stockholm, Sweden
| | | | | | | |
Collapse
|
24
|
Behr J, Kahles A, Zhong Y, Sreedharan VT, Drewe P, Rätsch G. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples. Bioinformatics 2013; 29:2529-38. [PMID: 23980025 PMCID: PMC3789545 DOI: 10.1093/bioinformatics/btt442] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2012] [Revised: 07/19/2013] [Accepted: 07/29/2013] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. RESULTS We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. AVAILABILITY MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.
Collapse
Affiliation(s)
- Jonas Behr
- Computational Biology Center, Sloan-Kettering Institute, 1275 York Avenue, New York, NY 10065, USA and Friedrich Miescher Laboratory, Max Planck Society, Spemannstr. 39, 72076 Tübingen, Germany
| | | | | | | | | | | |
Collapse
|
25
|
Drechsel G, Kahles A, Kesarwani AK, Stauffer E, Behr J, Drewe P, Rätsch G, Wachter A. Nonsense-mediated decay of alternative precursor mRNA splicing variants is a major determinant of the Arabidopsis steady state transcriptome. Plant Cell 2013; 25:3726-42. [PMID: 24163313 PMCID: PMC3877825 DOI: 10.1105/tpc.113.115485] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 09/17/2013] [Accepted: 10/07/2013] [Indexed: 05/18/2023]
Abstract
The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NMD remains scarce and even absent for any plant species. To address this, we conducted transcriptome-wide splicing studies using Arabidopsis thaliana mutants in the NMD factor homologs UP FRAMESHIFT1 (UPF1) and UPF3 as well as wild-type samples treated with the translation inhibitor cycloheximide. Our analyses revealed that at least 17.4% of all multi-exon, protein-coding genes produce splicing variants that are targeted by NMD. Moreover, we provide evidence that UPF1 and UPF3 act in a translation-independent mRNA decay pathway. Importantly, 92.3% of the NMD-responsive mRNAs exhibit classical NMD-eliciting features, supporting their authenticity as direct targets. Genes generating NMD-sensitive AS variants function in diverse biological processes, including signaling and protein modification, for which NaCl stress-modulated AS-NMD was found. Besides mRNAs, numerous noncoding RNAs and transcripts derived from intergenic regions were shown to be NMD responsive. In summary, we provide evidence for a major function of AS-coupled NMD in shaping the Arabidopsis transcriptome, having fundamental implications in gene regulation and quality control of transcript processing.
Collapse
Affiliation(s)
- Gabriele Drechsel
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - André Kahles
- Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
| | - Anil K. Kesarwani
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - Eva Stauffer
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - Jonas Behr
- Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
| | - Philipp Drewe
- Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
| | - Gunnar Rätsch
- Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
| | - Andreas Wachter
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
- Address correspondence to
| |
Collapse
|
26
|
Drewe P, Stegle O, Hartmann L, Kahles A, Bohnert R, Wachter A, Borgwardt K, Rätsch G. Accurate detection of differential RNA processing. Nucleic Acids Res 2013; 41:5189-98. [PMID: 23585274 PMCID: PMC3664801 DOI: 10.1093/nar/gkt211] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Deep transcriptome sequencing (RNA-Seq) has become a vital tool for studying the state of cells in the context of varying environments, genotypes and other factors. RNA-Seq profiling data enable identification of novel isoforms, quantification of known isoforms and detection of changes in transcriptional or RNA-processing activity. Existing approaches to detect differential isoform abundance between samples either require a complete isoform annotation or fall short in providing statistically robust and calibrated significance estimates. Here, we propose a suite of statistical tests to address these open needs: a parametric test that uses known isoform annotations to detect changes in relative isoform abundance and a non-parametric test that detects differential read coverages and can be applied when isoform annotations are not available. Both methods account for the discrete nature of read counts and the inherent biological variability. We demonstrate that these tests compare favorably to previous methods, both in terms of accuracy and statistical calibrations. We use these techniques to analyze RNA-Seq libraries from Arabidopsis thaliana and Drosophila melanogaster. The identified differential RNA processing events were consistent with RT–qPCR measurements and previous studies. The proposed toolkit is available from http://bioweb.me/rdiff and enables in-depth analyses of transcriptomes, with or without available isoform annotation.
Collapse
Affiliation(s)
- Philipp Drewe
- Computational Biology Center, Sloan-Kettering Institute, 1275 York Avenue, New York, NY 10065, USA.
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Rühl C, Stauffer E, Kahles A, Wagner G, Drechsel G, Rätsch G, Wachter A. Polypyrimidine tract binding protein homologs from Arabidopsis are key regulators of alternative splicing with implications in fundamental developmental processes. Plant Cell 2012; 24:4360-75. [PMID: 23192226 PMCID: PMC3531839 DOI: 10.1105/tpc.112.103622] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 10/09/2012] [Accepted: 10/24/2012] [Indexed: 05/18/2023]
Abstract
Alternative splicing (AS) generates transcript variants by variable exon/intron definition and massively expands transcriptome diversity. Changes in AS patterns have been found to be linked to manifold biological processes, yet fundamental aspects, such as the regulation of AS and its functional implications, largely remain to be addressed. In this work, widespread AS regulation by Arabidopsis thaliana Polypyrimidine tract binding protein homologs (PTBs) was revealed. In total, 452 AS events derived from 307 distinct genes were found to be responsive to the levels of the splicing factors PTB1 and PTB2, which predominantly triggered splicing of regulated introns, inclusion of cassette exons, and usage of upstream 5' splice sites. By contrast, no major AS regulatory function of the distantly related PTB3 was found. Dependent on their position within the mRNA, PTB-regulated events can both modify the untranslated regions and give rise to alternative protein products. We find that PTB-mediated AS events are connected to diverse biological processes, and the functional implications of selected instances were further elucidated. Specifically, PTB misexpression changes AS of PHYTOCHROME INTERACTING FACTOR6, coinciding with altered rates of abscisic acid-dependent seed germination. Furthermore, AS patterns as well as the expression of key flowering regulators were massively changed in a PTB1/2 level-dependent manner.
Collapse
Affiliation(s)
- Christina Rühl
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - Eva Stauffer
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - André Kahles
- Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
| | - Gabriele Wagner
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - Gabriele Drechsel
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| | - Gunnar Rätsch
- Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
| | - Andreas Wachter
- Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
| |
Collapse
|
28
|
Smith LM, Hartmann L, Drewe P, Bohnert R, Kahles A, Lanz C, Rätsch G. Multiple insert size paired-end sequencing for deconvolution of complex transcriptomes. RNA Biol 2012; 9:596-609. [PMID: 22614838 DOI: 10.4161/rna.19683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Deep sequencing of transcriptomes allows quantitative and qualitative analysis of many RNA species in a sample, with parallel comparison of expression levels, splicing variants, natural antisense transcripts, RNA editing and transcriptional start and stop sites the ideal goal. By computational modeling, we show how libraries of multiple insert sizes combined with strand-specific, paired-end (SS-PE) sequencing can increase the information gained on alternative splicing, especially in higher eukaryotes. Despite the benefits of gaining SS-PE data with paired ends of varying distance, the standard Illumina protocol allows only non-strand-specific, paired-end sequencing with a single insert size. Here, we modify the Illumina RNA ligation protocol to allow SS-PE sequencing by using a custom pre-adenylated 3' adaptor. We generate parallel libraries with differing insert sizes to aid deconvolution of alternative splicing events and to characterize the extent and distribution of natural antisense transcription in C. elegans. Despite stringent requirements for detection of alternative splicing, our data increases the number of intron retention and exon skipping events annotated in the Wormbase genome annotations by 127% and 121%, respectively. We show that parallel libraries with a range of insert sizes increase transcriptomic information gained by sequencing and that by current established benchmarks our protocol gives competitive results with respect to library quality.
Collapse
Affiliation(s)
- Lisa M Smith
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | | | | | | | | | | |
Collapse
|
29
|
Schultheiss SJ, Jean G, Behr J, Bohnert R, Drewe P, Görnitz N, Kahles A, Mudrakarta P, Sreedharan VT, Zeller G, Rätsch G. Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data. BMC Bioinformatics 2011. [PMCID: PMC3277255 DOI: 10.1186/1471-2105-12-s11-a7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
30
|
Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT, Kahles A, Bohnert R, Jean G, Derwent P, Kersey P, Belfield EJ, Harberd NP, Kemen E, Toomajian C, Kover PX, Clark RM, Rätsch G, Mott R. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 2011; 477:419-23. [PMID: 21874022 PMCID: PMC4856438 DOI: 10.1038/nature10414] [Citation(s) in RCA: 528] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 08/05/2011] [Indexed: 01/07/2023]
Abstract
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
Collapse
Affiliation(s)
- Xiangchao Gan
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Abstract
Next-generation sequencing technologies have revolutionized genome and transcriptome sequencing. RNA-Seq experiments are able to generate huge amounts of transcriptome sequence reads at a fraction of the cost of Sanger sequencing. Reads produced by these technologies are relatively short and error prone. To utilize such reads for transcriptome reconstruction and gene-structure identification, one needs to be able to accurately align the sequence reads over intron boundaries. In this unit, we describe PALMapper, a fast and easy-to-use tool that is designed to accurately compute both unspliced and spliced alignments for millions of RNA-Seq reads. It combines the efficient read mapper GenomeMapper with the spliced aligner QPALMA, which exploits read-quality information and predictions of splice sites to improve the alignment accuracy. The PALMapper package is available as a command-line tool running on Unix or Mac OS X systems or through a Web interface based on Galaxy tools.
Collapse
Affiliation(s)
- Géraldine Jean
- Friedrich Miescher Laboratory, Max Planck Society, Tübingen, Germany
| | | | | | | | | |
Collapse
|