76
|
|
77
|
Zepeda-Orozco D, Kong M, Scheuermann RH. Molecular Profile of Mitochondrial Dysfunction in Kidney Transplant Biopsies Is Associated With Poor Allograft Outcome. Transplant Proc 2016; 47:1675-82. [PMID: 26293032 DOI: 10.1016/j.transproceed.2015.04.086] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 04/07/2015] [Indexed: 12/14/2022]
Abstract
BACKGROUND In kidney transplantation (KT), progression of chronic histological damage with subclinical inflammation is associated with poor long-term allograft survival. The role of nonimmunological pathways in chronic allograft injury has not been fully assessed. METHODS We analyzed a public microarray dataset that used 1-year protocol kidney transplant biopsy specimens to investigate whether nonimmunological genes and pathways might influence long-term allograft outcome. The selected microarray dataset included 3 patient/sample groups based on their histological findings: normal histology (n = 25), interstitial fibrosis alone (IF alone, n = 24), and interstitial fibrosis with inflammation (IF+i, n = 16). The IF+i group had lower death-censored graft survival and renal function in patients with a mean follow-up of 4 years. We performed statistical analysis comparing gene expression patterns in the 3 group samples. RESULTS Gene cluster enrichment and group-specific expression patterns demonstrated a divergent pattern between mitochondrial and immune response genes, with downregulation of mitochondrial genes in the IF+i group. Gene ontological analysis of the downregulated mitochondrial genes identified generation of precursor metabolite and energy, and response to oxidative stress as the most significant biological processes. The transcription regulation pathway analysis of downregulated gene cluster demonstrated transcription factors involved in mitochondrial biogenesis. CONCLUSIONS The molecular signature of mitochondrial dysfunction reflects mitochondrial energetic insufficiency, and inadequate antioxidant response involved in mitochondria biogenesis pathways is associated with IF+i and worse long-term allograft survival. Thus, mitochondria function impairment appears to be an important nonimmune factor involved in chronic allograft injury.
Collapse
|
78
|
Brinkman RR, Aghaeepour N, Finak G, Gottardo R, Mosmann T, Scheuermann RH. State-of-the-Art in the Computational Analysis of Cytometry Data. Cytometry A 2016; 87:591-3. [PMID: 26111230 DOI: 10.1002/cyto.a.22707] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
79
|
Krishnaswami SR, Grindberg RV, Novotny M, Venepally P, Lacar B, Bhutani K, Linker SB, Pham S, Erwin JA, Miller JA, Hodge R, McCarthy JK, Kelder M, McCorrison J, Aevermann BD, Fuertes FD, Scheuermann RH, Lee J, Lein ES, Schork N, McConnell MJ, Gage FH, Lasken RS. Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons. Nat Protoc 2016; 11:499-524. [PMID: 26890679 PMCID: PMC4941947 DOI: 10.1038/nprot.2016.015] [Citation(s) in RCA: 260] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A protocol is described for sequencing the transcriptome of a cell nucleus. Nuclei are isolated from specimens and sorted by FACS, cDNA libraries are constructed and RNA-seq is performed, followed by data analysis. Some steps follow published methods (Smart-seq2 for cDNA synthesis and Nextera XT barcoded library preparation) and are not described in detail here. Previous single-cell approaches for RNA-seq from tissues include cell dissociation using protease treatment at 30 °C, which is known to alter the transcriptome. We isolate nuclei at 4 °C from tissue homogenates, which cause minimal damage. Nuclear transcriptomes can be obtained from postmortem human brain tissue stored at -80 °C, making brain archives accessible for RNA-seq from individual neurons. The method also allows investigation of biological features unique to nuclei, such as enrichment of certain transcripts and precursors of some noncoding RNAs. By following this procedure, it takes about 4 d to construct cDNA libraries that are ready for sequencing.
Collapse
|
80
|
Finak G, Langweiler M, Jaimes M, Malek M, Taghiyar J, Korin Y, Raddassi K, Devine L, Obermoser G, Pekalski ML, Pontikos N, Diaz A, Heck S, Villanova F, Terrazzini N, Kern F, Qian Y, Stanton R, Wang K, Brandes A, Ramey J, Aghaeepour N, Mosmann T, Scheuermann RH, Reed E, Palucka K, Pascual V, Blomberg BB, Nestle F, Nussenblatt RB, Brinkman RR, Gottardo R, Maecker H, McCoy JP. Standardizing Flow Cytometry Immunophenotyping Analysis from the Human ImmunoPhenotyping Consortium. Sci Rep 2016; 6:20686. [PMID: 26861911 PMCID: PMC4748244 DOI: 10.1038/srep20686] [Citation(s) in RCA: 199] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 01/05/2016] [Indexed: 01/21/2023] Open
Abstract
Standardization of immunophenotyping requires careful attention to reagents, sample handling, instrument setup, and data analysis, and is essential for successful cross-study and cross-center comparison of data. Experts developed five standardized, eight-color panels for identification of major immune cell subsets in peripheral blood. These were produced as pre-configured, lyophilized, reagents in 96-well plates. We present the results of a coordinated analysis of samples across nine laboratories using these panels with standardized operating procedures (SOPs). Manual gating was performed by each site and by a central site. Automated gating algorithms were developed and tested by the FlowCAP consortium. Centralized manual gating can reduce cross-center variability, and we sought to determine whether automated methods could streamline and standardize the analysis. Within-site variability was low in all experiments, but cross-site variability was lower when central analysis was performed in comparison with site-specific analysis. It was also lower for clearly defined cell subsets than those based on dim markers and for rare populations. Automated gating was able to match the performance of central manual analysis for all tested panels, exhibiting little to no bias and comparable variability. Standardized staining, data collection, and automated gating can increase power, reduce variability, and streamline analysis for immunophenotyping.
Collapse
|
81
|
Brinkman RR, Aghaeepour N, Finak G, Gottardo R, Mosmann T, Scheuermann RH. Automated analysis of flow cytometry data comes of age. Cytometry A 2016; 89:13-5. [DOI: 10.1002/cyto.a.22810] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 12/07/2015] [Indexed: 12/31/2022]
|
82
|
Aghaeepour N, Chattopadhyay P, Chikina M, Dhaene T, Van Gassen S, Kursa M, Lambrecht BN, Malek M, Qian Y, Qiu P, Saeys Y, Stanton R, Tong D, Vens C, Walkowiak S, Wang K, Finak G, Gottardo R, Mosmann T, Nolan G, Scheuermann RH, Brinkman RR. A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes. Cytometry A 2016; 89:16-21. [PMID: 26447924 PMCID: PMC4874734 DOI: 10.1002/cyto.a.22732] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Revised: 05/20/2015] [Accepted: 07/16/2015] [Indexed: 11/07/2022]
Abstract
The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of computational methods for identifying cell populations in multidimensional flow cytometry data. Here we report the results of FlowCAP-IV where algorithms from seven different research groups predicted the time to progression to AIDS among a cohort of 384 HIV+ subjects, using antigen-stimulated peripheral blood mononuclear cell (PBMC) samples analyzed with a 14-color staining panel. Two approaches (FlowReMi.1 and flowDensity-flowType-RchyOptimyx) provided statistically significant predictive value in the blinded test set. Manual validation of submitted results indicated that unbiased analysis of single cell phenotypes could reveal unexpected cell types that correlated with outcomes of interest in high dimensional flow cytometry datasets.
Collapse
|
83
|
Hsiao C, Liu M, Stanton R, McGee M, Qian Y, Scheuermann RH. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure. Cytometry A 2015; 89:71-88. [PMID: 26274018 PMCID: PMC5014134 DOI: 10.1002/cyto.a.22735] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 04/26/2015] [Accepted: 07/22/2015] [Indexed: 12/05/2022]
Abstract
Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry
Collapse
|
84
|
Kvistborg P, Gouttefangeas C, Aghaeepour N, Cazaly A, Chattopadhyay PK, Chan C, Eckl J, Finak G, Hadrup SR, Maecker HT, Maurer D, Mosmann T, Qiu P, Scheuermann RH, Welters MJP, Ferrari G, Brinkman RR, Britten CM. Thinking outside the gate: single-cell assessments in multiple dimensions. Immunity 2015; 42:591-2. [PMID: 25902473 PMCID: PMC4824634 DOI: 10.1016/j.immuni.2015.04.006] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
85
|
Miller MA, Schwartz T, Pickett BE, He S, Klem EB, Scheuermann RH, Passarotti M, Kaufman S, O’Leary MA. A RESTful API for Access to Phylogenetic Tools via the CIPRES Science Gateway. Evol Bioinform Online 2015; 11:43-8. [PMID: 25861210 PMCID: PMC4362911 DOI: 10.4137/ebo.s21501] [Citation(s) in RCA: 271] [Impact Index Per Article: 30.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 01/19/2015] [Accepted: 01/24/2015] [Indexed: 11/12/2022] Open
Abstract
The CIPRES Science Gateway is a community web application that provides public access to a set of parallel tree inference and multiple sequence alignment codes run on large computational resources. These resources are made available at no charge to users by the NSF Extreme Science and Engineering Discovery Environment (XSEDE) project. Here we describe the CIPRES RESTful application programmer interface (CRA), a web service that provides programmatic access to all resources and services currently offered by the CIPRES Science Gateway. Software developers can use the CRA to extend their web or desktop applications to include the ability to run MrBayes, BEAST, RAxML, MAFFT, and other computationally intensive algorithms on XSEDE. The CRA also makes it possible for individuals with modest scripting skills to access the same tools from the command line using curl, or through any scripting language. This report describes the CRA and its use in three web applications (Influenza Research Database - www.fludb.org, Virus Pathogen Resource - www.viprbrc.org, and MorphoBank - www.morphobank.org). The CRA is freely accessible to registered users at https://cipresrest.sdsc.edu/cipresrest/v1; supporting documentation and registration tools are available at https://www.phylo.org/restusers.
Collapse
|
86
|
Courtot M, Meskas J, Diehl AD, Droumeva R, Gottardo R, Jalali A, Taghiyar MJ, Maecker HT, McCoy JP, Ruttenberg A, Scheuermann RH, Brinkman RR. flowCL: ontology-based cell population labelling in flow cytometry. ACTA ACUST UNITED AC 2014; 31:1337-9. [PMID: 25481008 DOI: 10.1093/bioinformatics/btu807] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 12/02/2014] [Indexed: 11/13/2022]
Abstract
MOTIVATION Finding one or more cell populations of interest, such as those correlating to a specific disease, is critical when analysing flow cytometry data. However, labelling of cell populations is not well defined, making it difficult to integrate the output of algorithms to external knowledge sources. RESULTS We developed flowCL, a software package that performs semantic labelling of cell populations based on their surface markers and applied it to labelling of the Federation of Clinical Immunology Societies Human Immunology Project Consortium lyoplate populations as a use case. CONCLUSION By providing automated labelling of cell populations based on their immunophenotype, flowCL allows for unambiguous and reproducible identification of standardized cell types. AVAILABILITY AND IMPLEMENTATION Code, R script and documentation are available under the Artistic 2.0 license through Bioconductor (http://www.bioconductor.org/packages/devel/bioc/html/flowCL.html). CONTACT rbrinkman@bccrc.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
87
|
Aevermann BD, Pickett BE, Kumar S, Klem EB, Agnihothram S, Askovich PS, Bankhead A, Bolles M, Carter V, Chang J, Clauss TRW, Dash P, Diercks AH, Eisfeld AJ, Ellis A, Fan S, Ferris MT, Gralinski LE, Green RR, Gritsenko MA, Hatta M, Heegel RA, Jacobs JM, Jeng S, Josset L, Kaiser SM, Kelly S, Law GL, Li C, Li J, Long C, Luna ML, Matzke M, McDermott J, Menachery V, Metz TO, Mitchell H, Monroe ME, Navarro G, Neumann G, Podyminogin RL, Purvine SO, Rosenberger CM, Sanders CJ, Schepmoes AA, Shukla AK, Sims A, Sova P, Tam VC, Tchitchek N, Thomas PG, Tilton SC, Totura A, Wang J, Webb-Robertson BJ, Wen J, Weiss JM, Yang F, Yount B, Zhang Q, McWeeney S, Smith RD, Waters KM, Kawaoka Y, Baric R, Aderem A, Katze MG, Scheuermann RH. A comprehensive collection of systems biology data characterizing the host response to viral infection. Sci Data 2014; 1:140033. [PMID: 25977790 PMCID: PMC4410982 DOI: 10.1038/sdata.2014.33] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 08/15/2014] [Indexed: 12/13/2022] Open
Abstract
The Systems Biology for Infectious Diseases Research program was established by
the U.S. National Institute of Allergy and Infectious Diseases to investigate
host-pathogen interactions at a systems level. This program generated 47
transcriptomic and proteomic datasets from 30 studies that investigate
in vivo and in vitro host responses to
viral infections. Human pathogens in the Orthomyxoviridae and
Coronaviridae families, especially pandemic H1N1 and avian
H5N1 influenza A viruses and severe acute respiratory syndrome coronavirus
(SARS-CoV), were investigated. Study validation was demonstrated via
experimental quality control measures and meta-analysis of independent
experiments performed under similar conditions. Primary assay results are
archived at the GEO and PeptideAtlas public repositories, while processed
statistical results together with standardized metadata are publically available
at the Influenza Research Database (www.fludb.org) and the Virus Pathogen
Resource (www.viprbrc.org). By comparing data from mutant versus wild-type
virus and host strains, RNA versus protein differential expression, and
infection with genetically similar strains, these data can be used to further
investigate genetic and physiological determinants of host responses to viral
infection.
Collapse
|
88
|
Squires RB, Pickett BE, Das S, Scheuermann RH. Toward a method for tracking virus evolutionary trajectory applied to the pandemic H1N1 2009 influenza virus. INFECTION GENETICS AND EVOLUTION 2014; 28:351-7. [PMID: 25064525 DOI: 10.1016/j.meegid.2014.07.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Revised: 07/06/2014] [Accepted: 07/15/2014] [Indexed: 11/28/2022]
Abstract
In 2009 a novel pandemic H1N1 influenza virus (H1N1pdm09) emerged as the first official influenza pandemic of the 21st century. Early genomic sequence analysis pointed to the swine origin of the virus. Here we report a novel computational approach to determine the evolutionary trajectory of viral sequences that uses data-driven estimations of nucleotide substitution rates to track the gradual accumulation of observed sequence alterations over time. Phylogenetic analysis and multiple sequence alignments show that sequences belonging to the resulting evolutionary trajectory of the H1N1pdm09 lineage exhibit a gradual accumulation of sequence variations and tight temporal correlations in the topological structure of the phylogenetic trees. These results suggest that our evolutionary trajectory analysis (ETA) can more effectively pinpoint the evolutionary history of viruses, including the host and geographical location traversed by each segment, when compared against either BLAST or traditional phylogenetic analysis alone.
Collapse
|
89
|
Dugan VG, Emrich SJ, Giraldo-Calderón GI, Harb OS, Newman RM, Pickett BE, Schriml LM, Stockwell TB, Stoeckert CJ, Sullivan DE, Singh I, Ward DV, Yao A, Zheng J, Barrett T, Birren B, Brinkac L, Bruno VM, Caler E, Chapman S, Collins FH, Cuomo CA, Di Francesco V, Durkin S, Eppinger M, Feldgarden M, Fraser C, Fricke WF, Giovanni M, Henn MR, Hine E, Hotopp JD, Karsch-Mizrachi I, Kissinger JC, Lee EM, Mathur P, Mongodin EF, Murphy CI, Myers G, Neafsey DE, Nelson KE, Nierman WC, Puzak J, Rasko D, Roos DS, Sadzewicz L, Silva JC, Sobral B, Squires RB, Stevens RL, Tallon L, Tettelin H, Wentworth D, White O, Will R, Wortman J, Zhang Y, Scheuermann RH. Standardized metadata for human pathogen/vector genomic sequences. PLoS One 2014; 9:e99979. [PMID: 24936976 PMCID: PMC4061050 DOI: 10.1371/journal.pone.0099979] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 05/15/2014] [Indexed: 11/18/2022] Open
Abstract
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
Collapse
|
90
|
Gabriel VA, McClellan EA, Scheuermann RH. Response of human skin to esthetic scarification. Burns 2014; 40:1338-44. [PMID: 24582755 DOI: 10.1016/j.burns.2014.01.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Revised: 11/01/2013] [Accepted: 01/11/2014] [Indexed: 10/25/2022]
Abstract
This study was undertaken to investigate changes in RNA expression in previously healthy adult human skin following thermal injury induced by contact with hot metal that was undertaken as part of esthetic scarification, a body modification practice. Subjects were recruited to have pre-injury skin and serial wound biopsies performed. 4 mm punch biopsies were taken prior to branding and 1 h, 1 week, and 1, 2 and 3 months after injury. RNA was extracted and quality assured prior to the use of a whole-genome based bead array platform to describe expression changes in the samples using the pre-injury skin as a comparator. Analysis of the array data was performed using k-means clustering and a hypergeometric probability distribution without replacement and corrections for multiple comparisons were done. Confirmatory q-PCR was performed. Using a k of 10, several clusters of genes were shown to co-cluster together based on Gene Ontology classification with probabilities unlikely to occur by chance alone. OF particular interest were clusters relating to cell cycle, proteinaceous extracellular matrix and keratinization. Given the consistent expression changes at 1 week following injury in the cell cycle cluster, there is an opportunity to intervene early following burn injury to influence scar development.
Collapse
|
91
|
Spidlen J, Barsky A, Breuer K, Carr P, Nazaire MD, Hill BA, Qian Y, Liefeld T, Reich M, Mesirov JP, Wilkinson P, Scheuermann RH, Sekaly RP, Brinkman RR. GenePattern flow cytometry suite. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013; 8:14. [PMID: 23822732 PMCID: PMC3717030 DOI: 10.1186/1751-0473-8-14] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 06/21/2013] [Indexed: 01/08/2023]
Abstract
BACKGROUND Traditional flow cytometry data analysis is largely based on interactive and time consuming analysis of series two dimensional representations of up to 20 dimensional data. Recent technological advances have increased the amount of data generated by the technology and outpaced the development of data analysis approaches. While there are advanced tools available, including many R/BioConductor packages, these are only accessible programmatically and therefore out of reach for most experimentalists. GenePattern is a powerful genomic analysis platform with over 200 tools for analysis of gene expression, proteomics, and other data. A web-based interface provides easy access to these tools and allows the creation of automated analysis pipelines enabling reproducible research. RESULTS In order to bring advanced flow cytometry data analysis tools to experimentalists without programmatic skills, we developed the GenePattern Flow Cytometry Suite. It contains 34 open source GenePattern flow cytometry modules covering methods from basic processing of flow cytometry standard (i.e., FCS) files to advanced algorithms for automated identification of cell populations, normalization and quality assessment. Internally, these modules leverage from functionality developed in R/BioConductor. Using the GenePattern web-based interface, they can be connected to build analytical pipelines. CONCLUSIONS GenePattern Flow Cytometry Suite brings advanced flow cytometry data analysis capabilities to users with minimal computer skills. Functionality previously available only to skilled bioinformaticians is now easily accessible from a web browser.
Collapse
|
92
|
Rink B, Roberts K, Harabagiu S, Scheuermann RH, Toomay S, Browning T, Bosler T, Peshock R. Extracting actionable findings of appendicitis from radiology reports using natural language processing. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2013; 2013:221. [PMID: 24303268 PMCID: PMC3845763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/29/2022]
Abstract
Radiology reports often contain findings about the condition of a patient which should be acted upon quickly. These actionable findings in a radiology report can be automatically detected to ensure that the referring physician is notified about such findings and to provide feedback to the radiologist that further action has been taken. In this paper we investigate a method for detecting actionable findings of appendicitis in radiology reports. The method identifies both individual assertions regarding the presence of appendicitis and other findings related to appendicitis using syntactic dependency patterns. All relevant individual statements from a report are collectively considered to determine whether the report is consistent with appendicitis. Evaluation on a corpus of 400 radiology reports annotated by two expert radiologists showed that our approach achieves a precision of 91%, a recall of 83%, and an F1-measure of 87%.
Collapse
|
93
|
Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, Gottardo R, Scheuermann RH. Critical assessment of automated flow cytometry data analysis techniques. Nat Methods 2013; 10:228-38. [PMID: 23396282 PMCID: PMC3906045 DOI: 10.1038/nmeth.2365] [Citation(s) in RCA: 350] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 01/14/2013] [Indexed: 12/14/2022]
Abstract
In this analysis, the authors directly compared the performance of flow cytometry data processing algorithms to manual gating approaches. The results offer information of practical utility about the performance of the algorithms as applied to different data sets and challenges. Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
Collapse
|
94
|
Roberts K, Rink B, Harabagiu SM, Scheuermann RH, Toomay S, Browning T, Bosler T, Peshock R. A machine learning approach for identifying anatomical locations of actionable findings in radiology reports. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:779-88. [PMID: 23304352 PMCID: PMC3540484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Recognizing the anatomical location of actionable findings in radiology reports is an important part of the communication of critical test results between caregivers. One of the difficulties of identifying anatomical locations of actionable findings stems from the fact that anatomical locations are not always stated in a simple, easy to identify manner. Natural language processing techniques are capable of recognizing the relevant anatomical location by processing a diverse set of lexical and syntactic contexts that correspond to the various ways that radiologists represent spatial relations. We report a precision of 86.2%, recall of 85.9%, and F(1)-measure of 86.0 for extracting the anatomical site of an actionable finding. Additionally, we report a precision of 73.8%, recall of 69.8%, and F(1)-measure of 71.8 for extracting an additional anatomical site that grounds underspecified locations. This demonstrates promising results for identifying locations, while error analysis reveals challenges under certain contexts. Future work will focus on incorporating new forms of medical language processing to improve performance and transitioning our method to new types of clinical data.
Collapse
|
95
|
Qian Y, Liu Y, Campbell J, Thomson E, Kong YM, Scheuermann RH. FCSTrans: an open source software system for FCS file conversion and data transformation. Cytometry A 2012; 81:353-356. [PMID: 22431383 DOI: 10.1002/cyto.a.22037] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2011] [Revised: 02/03/2012] [Accepted: 02/14/2012] [Indexed: 11/09/2022]
|
96
|
Squires RB, Noronha J, Hunt V, García-Sastre A, Macken C, Baumgarth N, Suarez D, Pickett BE, Zhang Y, Larsen CN, Ramsey A, Zhou L, Zaremba S, Kumar S, Deitrich J, Klem E, Scheuermann RH. Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza Other Respir Viruses 2012; 6:404-16. [PMID: 22260278 PMCID: PMC3345175 DOI: 10.1111/j.1750-2659.2011.00331.x] [Citation(s) in RCA: 244] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics.
Collapse
|
97
|
Pickett BE, Sadat EL, Zhang Y, Noronha JM, Squires RB, Hunt V, Liu M, Kumar S, Zaremba S, Gu Z, Zhou L, Larson CN, Dietrich J, Klem EB, Scheuermann RH. ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res 2012; 40:D593-8. [PMID: 22006842 PMCID: PMC3245011 DOI: 10.1093/nar/gkr859] [Citation(s) in RCA: 482] [Impact Index Per Article: 40.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Revised: 09/21/2011] [Accepted: 09/23/2011] [Indexed: 01/18/2023] Open
Abstract
The Virus Pathogen Database and Analysis Resource (ViPR, www.ViPRbrc.org) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR contains information for human pathogenic viruses belonging to the Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Flaviviridae, Filoviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae and Togaviridae families, with plans to support additional virus families in the future. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. Analytical and visualization tools for metadata-driven statistical sequence analysis, multiple sequence alignment, phylogenetic tree construction, BLAST comparison and sequence variation determination are also provided. Data filtering and analysis workflows can be combined and the results saved in personal 'Workbenches' for future use. ViPR tools and data are available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses.
Collapse
|
98
|
Chen Z, Liu Q, McGee M, Kong M, Huang X, Deng Y, Scheuermann RH. A gene selection method for GeneChip array data with small sample sizes. BMC Genomics 2011; 12 Suppl 5:S7. [PMID: 22369149 PMCID: PMC3287503 DOI: 10.1186/1471-2164-12-s5-s7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background In microarray experiments with small sample sizes, it is a challenge to estimate p-values accurately and decide cutoff p-values for gene selection appropriately. Although permutation-based methods have proved to have greater sensitivity and specificity than the regular t-test, their p-values are highly discrete due to the limited number of permutations available in very small sample sizes. Furthermore, estimated permutation-based p-values for true nulls are highly correlated and not uniformly distributed between zero and one, making it difficult to use current false discovery rate (FDR)-controlling methods. Results We propose a model-based information sharing method (MBIS) that, after an appropriate data transformation, utilizes information shared among genes. We use a normal distribution to model the mean differences of true nulls across two experimental conditions. The parameters of the model are then estimated using all data in hand. Based on this model, p-values, which are uniformly distributed from true nulls, are calculated. Then, since FDR-controlling methods are generally not well suited to microarray data with very small sample sizes, we select genes for a given cutoff p-value and then estimate the false discovery rate. Conclusion Simulation studies and analysis using real microarray data show that the proposed method, MBIS, is more powerful and reliable than current methods. It has wide application to a variety of situations.
Collapse
|
99
|
Huang J, Mirel D, Pugh E, Xing C, Robinson PN, Pertsemlidis A, Ding L, Kozlitina J, Maher J, Rios J, Story M, Marthandan N, Scheuermann RH. Minimum Information about a Genotyping Experiment (MIGEN). Stand Genomic Sci 2011; 5:224-9. [PMID: 22180825 PMCID: PMC3235517 DOI: 10.4056/sigs.1994602] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Genotyping experiments are widely used in clinical and basic research laboratories to identify associations between genetic variations and normal/abnormal phenotypes. Genotyping assay techniques vary from single genomic regions that are interrogated using PCR reactions to high throughput assays examining genome-wide sequence and structural variation. The resulting genotype data may include millions of markers of thousands of individuals, requiring various statistical, modeling or other data analysis methodologies to interpret the results. To date, there are no standards for reporting genotyping experiments. Here we present the Minimum Information about a Genotyping Experiment (MIGen) standard, defining the minimum information required for reporting genotyping experiments. MIGen standard covers experimental design, subject description, genotyping procedure, quality control and data analysis. MIGen is a registered project under MIBBI (Minimum Information for Biological and Biomedical Investigations) and is being developed by an interdisciplinary group of experts in basic biomedical science, clinical science, biostatistics and bioinformatics. To accommodate the wide variety of techniques and methodologies applied in current and future genotyping experiment, MIGen leverages foundational concepts from the Ontology for Biomedical Investigations (OBI) for the description of the various types of planned processes and implements a hierarchical document structure. The adoption of MIGen by the research community will facilitate consistent genotyping data interpretation and independent data validation. MIGen can also serve as a framework for the development of data models for capturing and storing genotyping results and experiment metadata in a structured way, to facilitate the exchange of metadata.
Collapse
|
100
|
Mack SJ, Guidry PA, Marthandan N, Smith T, Campbell J, Dunn P, Karp DR, Single RM, Thomson G, Wiser J, Scheuermann RH, Erlich HA. 200-P The immport ambiguity resolution tool: A frequency-based approach to resolving allelic and genotypic ambiguity in HLA genotype data. Hum Immunol 2011. [DOI: 10.1016/j.humimm.2011.07.225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|