1
|
LeRoy NJ, Khoroshevskyi O, O’Brien A, Stepień R, Arslan A, Sheffield NC. PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.15.551388. [PMID: 37645717 PMCID: PMC10462087 DOI: 10.1101/2023.08.15.551388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Background As biological data increases, we need additional infrastructure to share it and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important, and in some ways has a wider scope than sharing data itself. Results Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data, or to share new data. Availability https://pephub.databio.org.
Collapse
Affiliation(s)
- Nathan J. LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, 22904, Charlottesville VA
| | - Oleksandr Khoroshevskyi
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
| | - Aaron O’Brien
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
| | - Rafał Stepień
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
| | - Alip Arslan
- Department of Computer Science, School of Engineering, University of Virginia, 22908, Charlottesville VA
| | - Nathan C. Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
- School of Data Science, University of Virginia, Charlottesville VA 22904, Charlottesville VA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, 22904, Charlottesville VA
- Department of Public Health Sciences, School of Medicine, University of Virginia, 22908, Charlottesville VA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, 22908, Charlottesville VA
- Child Health Research Center, School of Medicine, University of Virginia, 22908, Charlottesville VA
| |
Collapse
|
2
|
LeRoy NJ, Khoroshevskyi O, O’Brien A, Stępień R, Arslan A, Sheffield NC. PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata. Gigascience 2024; 13:giae033. [PMID: 38991851 PMCID: PMC11238423 DOI: 10.1093/gigascience/giae033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 02/07/2024] [Accepted: 05/21/2024] [Indexed: 07/13/2024] Open
Abstract
BACKGROUND As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves. RESULTS Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data. AVAILABILITY https://pephub.databio.org.
Collapse
Affiliation(s)
- Nathan J LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
| | - Oleksandr Khoroshevskyi
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Aaron O’Brien
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Rafał Stępień
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Alip Arslan
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
3
|
Yu P, Li J, Deng SP, Zhang F, Grozdanov PN, Chin EWM, Martin SD, Vergnes L, Islam MS, Sun D, LaSalle JM, McGee SL, Goh E, MacDonald CC, Jin P. Integrated analysis of a compendium of RNA-Seq datasets for splicing factors. Sci Data 2020; 7:178. [PMID: 32546682 PMCID: PMC7297722 DOI: 10.1038/s41597-020-0514-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 03/13/2020] [Indexed: 02/05/2023] Open
Abstract
A vast amount of public RNA-sequencing datasets have been generated and used widely to study transcriptome mechanisms. These data offer precious opportunity for advancing biological research in transcriptome studies such as alternative splicing. We report the first large-scale integrated analysis of RNA-Seq data of splicing factors for systematically identifying key factors in diseases and biological processes. We analyzed 1,321 RNA-Seq libraries of various mouse tissues and cell lines, comprising more than 6.6 TB sequences from 75 independent studies that experimentally manipulated 56 splicing factors. Using these data, RNA splicing signatures and gene expression signatures were computed, and signature comparison analysis identified a list of key splicing factors in Rett syndrome and cold-induced thermogenesis. We show that cold-induced RNA-binding proteins rescue the neurite outgrowth defects in Rett syndrome using neuronal morphology analysis, and we also reveal that SRSF1 and PTBP1 are required for energy expenditure in adipocytes using metabolic flux analysis. Our study provides an integrated analysis for identifying key factors in diseases and biological processes and highlights the importance of public data resources for identifying hypotheses for experimental testing.
Collapse
Affiliation(s)
- Peng Yu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
- Medical Big Data Center, Sichuan University, Chengdu, China.
| | - Jin Li
- Center for Epigenetics & Disease Prevention, Institute of Biosciences and Technology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Su-Ping Deng
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu, 215009, China
| | - Feiran Zhang
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Petar N Grozdanov
- Department of Cell Biology & Biochemistry, Texas Tech University Health Sciences Center, Lubbock, Texas, 79430, USA
| | - Eunice W M Chin
- Neuroscience Academic Clinical Programme, Duke-NUS Medical School, NA, Singapore
| | - Sheree D Martin
- Metabolic Reprogramming Laboratory, Metabolic Research Unit, School of Medicine and Centre for Molecular and Medical Research, Deakin University, Geelong, Victoria, Australia
| | - Laurent Vergnes
- Department of Human Genetics, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA, USA
| | - M Saharul Islam
- Department of Medical Microbiology and Immunology, Genome Center, and MIND Institute, University of California Davis, Davis, CA, USA
| | - Deqiang Sun
- Center for Epigenetics & Disease Prevention, Institute of Biosciences and Technology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Janine M LaSalle
- Department of Medical Microbiology and Immunology, Genome Center, and MIND Institute, University of California Davis, Davis, CA, USA
| | - Sean L McGee
- Metabolic Reprogramming Laboratory, Metabolic Research Unit, School of Medicine and Centre for Molecular and Medical Research, Deakin University, Geelong, Victoria, Australia
| | - Eyleen Goh
- Neuroscience Academic Clinical Programme, Duke-NUS Medical School, NA, Singapore
| | - Clinton C MacDonald
- Department of Cell Biology & Biochemistry, Texas Tech University Health Sciences Center, Lubbock, Texas, 79430, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia, USA
| |
Collapse
|
4
|
Li J, Deng SP, Vieira J, Thomas J, Costa V, Tseng CS, Ivankovic F, Ciccodicola A, Yu P. RBPMetaDB: a comprehensive annotation of mouse RNA-Seq datasets with perturbations of RNA-binding proteins. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5040291. [PMID: 29931156 PMCID: PMC6009576 DOI: 10.1093/database/bay054] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 05/17/2018] [Indexed: 01/03/2023]
Abstract
RNA-binding proteins (RBPs) may play a critical role in gene regulation in various diseases or biological processes by controlling post-transcriptional events such as polyadenylation, splicing and mRNA stabilization via binding activities to RNA molecules. Owing to the importance of RBPs in gene regulation, a great number of studies have been conducted, resulting in a large amount of RNA-Seq datasets. However, these datasets usually do not have structured organization of metadata, which limits their potentially wide use. To bridge this gap, the metadata of a comprehensive set of publicly available mouse RNA-Seq datasets with perturbed RBPs were collected and integrated into a database called RBPMetaDB. This database contains 292 mouse RNA-Seq datasets for a comprehensive list of 187 RBPs. These RBPs account for only ∼10% of all known RBPs annotated in Gene Ontology, indicating that most are still unexplored using high-throughput sequencing. This negative information provides a great pool of candidate RBPs for biologists to conduct future experimental studies. In addition, we found that DNA-binding activities are significantly enriched among RBPs in RBPMetaDB, suggesting that prior studies of these DNA- and RNA-binding factors focus more on DNA-binding activities instead of RNA-binding activities. This result reveals the opportunity to efficiently reuse these data for investigation of the roles of their RNA-binding activities. A web application has also been implemented to enable easy access and wide use of RBPMetaDB. It is expected that RBPMetaDB will be a great resource for improving understanding of the biological roles of RBPs. Database URL: http://rbpmetadb.yubiolab.org
Collapse
Affiliation(s)
- Jin Li
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Su-Ping Deng
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Jacob Vieira
- The Department of Microbiology, University of Massachusetts Amherst, Amherst, MA, USA
| | - James Thomas
- Department of Molecular Genetics and Microbiology, Center for NeuroGenetics and the Genetics Institute, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Valerio Costa
- Institute of Genetics and Biophysics "Adriano Buzzati-Traverso", Consiglio Nazionale delle Ricerche, Via P. Castellino 111, 80131 Naples, Italy
| | - Ching-San Tseng
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 11529, Taiwan
| | - Franjo Ivankovic
- Department of Molecular Genetics and Microbiology, Center for NeuroGenetics and the Genetics Institute, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Alfredo Ciccodicola
- Institute of Genetics and Biophysics "Adriano Buzzati-Traverso", Consiglio Nazionale delle Ricerche, Via P. Castellino 111, 80131 Naples, Italy.,Department of Science and Technology, University Parthenope of Naples, 80131 Naples, Italy
| | - Peng Yu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
5
|
Li Z, Li J, Yu P. GEOMetaCuration: a web-based application for accurate manual curation of Gene Expression Omnibus metadata. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4953404. [PMID: 29688376 PMCID: PMC5868185 DOI: 10.1093/database/bay019] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 01/30/2018] [Indexed: 01/15/2023]
Abstract
Abstract Metadata curation has become increasingly important for biological discovery and biomedical research because a large amount of heterogeneous biological data is currently freely available. To facilitate efficient metadata curation, we developed an easy-to-use web-based curation application, GEOMetaCuration, for curating the metadata of Gene Expression Omnibus datasets. It can eliminate mechanical operations that consume precious curation time and can help coordinate curation efforts among multiple curators. It improves the curation process by introducing various features that are critical to metadata curation, such as a back-end curation management system and a curator-friendly front-end. The application is based on a commonly used web development framework of Python/Django and is open-sourced under the GNU General Public License V3. GEOMetaCuration is expected to benefit the biocuration community and to contribute to computational generation of biological insights using large-scale biological data. An example use case can be found at the demo website: http://geometacuration.yubiolab.org.
Database URL: https://bitbucket.com/yubiolab/GEOMetaCuration
Collapse
Affiliation(s)
- Zhao Li
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Jin Li
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Peng Yu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
6
|
Li J, Tseng CS, Federico A, Ivankovic F, Huang YS, Ciccodicola A, Swanson MS, Yu P. SFMetaDB: a comprehensive annotation of mouse RNA splicing factor RNA-Seq datasets. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4161772. [PMID: 29220461 PMCID: PMC5737203 DOI: 10.1093/database/bax071] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 08/15/2017] [Indexed: 02/07/2023]
Abstract
Although the number of RNA-Seq datasets deposited publicly has increased over the past few years, incomplete annotation of the associated metadata limits their potential use. Because of the importance of RNA splicing in diseases and biological processes, we constructed a database called SFMetaDB by curating datasets related with RNA splicing factors. Our effort focused on the RNA-Seq datasets in which splicing factors were knocked-down, knocked-out or over-expressed, leading to 75 datasets corresponding to 56 splicing factors. These datasets can be used in differential alternative splicing analysis for the identification of the potential targets of these splicing factors and other functional studies. Surprisingly, only ∼15% of all the splicing factors have been studied by loss- or gain-of-function experiments using RNA-Seq. In particular, splicing factors with domains from a few dominant Pfam domain families have not been studied. This suggests a significant gap that needs to be addressed to fully elucidate the splicing regulatory landscape. Indeed, there are already mouse models available for ∼20 of the unstudied splicing factors, and it can be a fruitful research direction to study these splicing factors in vitro and in vivo using RNA-Seq. Database URL:http://sfmetadb.ece.tamu.edu/
Collapse
Affiliation(s)
- Jin Li
- Department of Electrical and Computer Engineering.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| | - Ching-San Tseng
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Antonio Federico
- Institute of Genetics and Biophysics "Adriano Buzzati Traverso", CNR, Naples, Italy.,Department of Science and Technology, University of Naples "Parthenope", Naples, Italy
| | - Franjo Ivankovic
- Department of Molecular Genetics and Microbiology, Center for NeuroGenetics and the Genetics Institute, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Yi-Shuian Huang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Alfredo Ciccodicola
- Institute of Genetics and Biophysics "Adriano Buzzati Traverso", CNR, Naples, Italy.,Department of Science and Technology, University of Naples "Parthenope", Naples, Italy
| | - Maurice S Swanson
- Department of Molecular Genetics and Microbiology, Center for NeuroGenetics and the Genetics Institute, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Peng Yu
- Department of Electrical and Computer Engineering.,TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
7
|
Li J, Zheng L, Uchiyama A, Bin L, Mauro TM, Elias PM, Pawelczyk T, Sakowicz-Burkiewicz M, Trzeciak M, Leung DYM, Morasso MI, Yu P. A data mining paradigm for identifying key factors in biological processes using gene expression data. Sci Rep 2018; 8:9083. [PMID: 29899432 PMCID: PMC5998123 DOI: 10.1038/s41598-018-27258-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 05/21/2018] [Indexed: 12/15/2022] Open
Abstract
A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.
Collapse
Affiliation(s)
- Jin Li
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Le Zheng
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA
| | - Akihiko Uchiyama
- Laboratory of Skin Biology, National Institute for Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Lianghua Bin
- Department of Pediatrics, National Jewish Health, Denver, Colorado, USA
| | - Theodora M Mauro
- Dermatology Service, Veterans Affairs Medical Center, and Department of Dermatology, UCSF, San Francisco, California, USA
| | - Peter M Elias
- Dermatology Service, Veterans Affairs Medical Center, and Department of Dermatology, UCSF, San Francisco, California, USA
| | - Tadeusz Pawelczyk
- Department of Molecular Medicine, Medical University of Gdansk, Gdansk, Poland
| | | | - Magdalena Trzeciak
- Department of Dermatology, Venerology and Allergology, Medical University of Gdansk, Gdansk, Poland
| | - Donald Y M Leung
- Department of Pediatrics, National Jewish Health, Denver, Colorado, USA
| | - Maria I Morasso
- Laboratory of Skin Biology, National Institute for Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Peng Yu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, 77843, USA.
- TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, TX, 77843, USA.
| |
Collapse
|
8
|
Bernstein MN, Doan A, Dewey CN. MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 2018; 33:2914-2923. [PMID: 28535296 PMCID: PMC5870770 DOI: 10.1093/bioinformatics/btx334] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 05/21/2017] [Indexed: 01/31/2023] Open
Abstract
Motivation The NCBI’s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA. Results We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline. Availability and implementation The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipeline Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - AnHai Doan
- Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
| | - Colin N Dewey
- Department of Computer Sciences, University of Wisconsin, Madison, WI, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA
- To whom correspondence should be addressed.
| |
Collapse
|