1
|
A Rapid and Affordable Screening Tool for Early-Stage Ovarian Cancer Detection Based on MALDI-ToF MS of Blood Serum. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12063030] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Ovarian cancer is a worldwide health issue that grows at a rate of almost 250,000 new cases every year. Its early detection is key for a good prognosis and even curative surgery. However, current medical examination methods and tests have been inefficient in detecting ovarian cancer at the early stage, leading to preventable death. So far, new screening tests based on molecular biomarker analysis techniques have not resulted in any substantial improvement in early-stage diagnosis and increased survival. Thus, whilst there remains clear potential to improve outcomes through early detection, novel approaches are needed. Here, we postulated that MALDI-ToF-mass-spectrometry-based tests can be a solution for effective screening of ovarian cancer. In this retrospective cohort study, we generated and analyzed the mass spectra of 181 serum samples of women with and without ovarian cancer. Using bioinformatics pipelines for analysis, including predictive modeling and machine learning, we found distinct mass spectral patterns composed of 9–20 key combinations of peak intensity or peak enrichment features for each stage of ovarian cancer. Based on a scoring algorithm and obtained patterns, the optimal sensitivity for detecting each stage of cancer was 95–97% with a specificity of 97%. Scoring all algorithms simultaneously could detect all stages of ovarian cancer at 99% sensitivity and 92% specificity. The results further demonstrate that the matrix and mass range analyzed played a key role in improving the mass spectral data quality and diagnostic power. Altogether, with the results reported here and increasing evidence of the MS assay’s diagnostic accuracy and instrument robustness, it has become imminent to consider MS in the clinical application for ovarian cancer screening.
Collapse
|
2
|
Liang Y, Kelemen A, Kelemen A. Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0039. [PMID: 31077580 DOI: 10.1515/sagmb-2018-0039] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD 21201-1579, USA
| | - Adam Kelemen
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Arpad Kelemen
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD 21201-1579, USA
| |
Collapse
|
3
|
Morris JS, Baladandayuthapani V. Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration. STAT MODEL 2017; 17:245-289. [PMID: 29129969 PMCID: PMC5679480 DOI: 10.1177/1471082x17698255] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.
Collapse
Affiliation(s)
- Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
4
|
Duncan MW, Nedelkov D, Walsh R, Hattan SJ. Applications of MALDI Mass Spectrometry in Clinical Chemistry. Clin Chem 2015; 62:134-43. [PMID: 26585930 DOI: 10.1373/clinchem.2015.239491] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 11/02/2015] [Indexed: 02/02/2023]
Abstract
BACKGROUND MALDI-TOF mass spectrometry (MS) is set to make inroads into clinical chemistry because it offers advantages over other analytical platforms. These advantages include low acquisition and operating costs, ease of use, ruggedness, and high throughput. When coupled with innovative front-end strategies and applied to important clinical problems, it can deliver rapid, sensitive, and cost-effective assays. CONTENT This review describes the general principles of MALDI-TOF MS, highlights the unique features of the platform, and discusses some practical methods based upon it. There is substantial potential for MALDI-TOF MS to make further inroads into clinical chemistry because of the selectivity of mass detection and its ability to independently quantify proteoforms. SUMMARY MALDI-TOF MS has already transformed the practice of clinical microbiology and this review illustrates how and why it is now set to play an increasingly important role in in vitro diagnostics in particular, and clinical chemistry in general.
Collapse
Affiliation(s)
- Mark W Duncan
- Division of Endocrinology, Diabetes & Metabolism, Department of Medicine, School of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO; Obesity Research Center, College of Medicine, King Saud University, Riyadh 11461, Saudi Arabia;
| | - Dobrin Nedelkov
- Molecular Biomarkers Laboratory, Biodesign Institute, Arizona State University, Tempe, AZ
| | - Ryan Walsh
- Division of Endocrinology, Diabetes & Metabolism, Department of Medicine, School of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO
| | | |
Collapse
|
5
|
Cairns DA. Statistical issues in the design and planning of proteomic profiling experiments. Methods Mol Biol 2015; 1243:223-236. [PMID: 25384749 DOI: 10.1007/978-1-4939-1872-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The statistical design of a clinical proteomics experiment is a critical part of well-undertaken investigation. Standard concepts from experimental design such as randomization, replication and blocking should be applied in all experiments, and this is possible when the experimental conditions are well understood by the investigator. The large number of proteins simultaneously considered in proteomic discovery experiments means that determining the number of required replicates to perform a powerful experiment is more complicated than in simple experiments. However, by using information about the nature of an experiment and making simple assumptions this is achievable for a variety of experiments useful for biomarker discovery and initial validation.
Collapse
Affiliation(s)
- David A Cairns
- Section of Oncology and Clinical Research, Leeds Institute of Cancer and Pathology, St. James's University Hospital, Beckett Street, Leeds, LS9 7TF, UK,
| |
Collapse
|
6
|
Rudnick PA, Wang X, Yan X, Sedransk N, Stein SE. Improved normalization of systematic biases affecting ion current measurements in label-free proteomics data. Mol Cell Proteomics 2014; 13:1341-51. [PMID: 24563535 DOI: 10.1074/mcp.m113.030593] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Normalization is an important step in the analysis of quantitative proteomics data. If this step is ignored, systematic biases can lead to incorrect assumptions about regulation. Most statistical procedures for normalizing proteomics data have been borrowed from genomics where their development has focused on the removal of so-called 'batch effects.' In general, a typical normalization step in proteomics works under the assumption that most peptides/proteins do not change; scaling is then used to give a median log-ratio of 0. The focus of this work was to identify other factors, derived from knowledge of the variables in proteomics, which might be used to improve normalization. Here we have examined the multi-laboratory data sets from Phase I of the NCI's CPTAC program. Surprisingly, the most important bias variables affecting peptide intensities within labs were retention time and charge state. The magnitude of these observations was exaggerated in samples of unequal concentrations or "spike-in" levels, presumably because the average precursor charge for peptides with higher charge state potentials is lower at higher relative sample concentrations. These effects are consistent with reduced protonation during electrospray and demonstrate that the physical properties of the peptides themselves can serve as good reporters of systematic biases. Between labs, retention time, precursor m/z, and peptide length were most commonly the top-ranked bias variables, over the standardly used average intensity (A). A larger set of variables was then used to develop a stepwise normalization procedure. This statistical model was found to perform as well or better on the CPTAC mock biomarker data than other commonly used methods. Furthermore, the method described here does not require a priori knowledge of the systematic biases in a given data set. These improvements can be attributed to the inclusion of variables other than average intensity during normalization.
Collapse
Affiliation(s)
- Paul A Rudnick
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Gaithersburg, Maryland
| | | | | | | | | |
Collapse
|
7
|
Ejigu BA, Valkenborg D, Baggerman G, Vanaerschot M, Witters E, Dujardin JC, Burzykowski T, Berg M. Evaluation of normalization methods to pave the way towards large-scale LC-MS-based metabolomics profiling experiments. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2013; 17:473-85. [PMID: 23808607 DOI: 10.1089/omi.2013.0010] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Combining liquid chromatography-mass spectrometry (LC-MS)-based metabolomics experiments that were collected over a long period of time remains problematic due to systematic variability between LC-MS measurements. Until now, most normalization methods for LC-MS data are model-driven, based on internal standards or intermediate quality control runs, where an external model is extrapolated to the dataset of interest. In the first part of this article, we evaluate several existing data-driven normalization approaches on LC-MS metabolomics experiments, which do not require the use of internal standards. According to variability measures, each normalization method performs relatively well, showing that the use of any normalization method will greatly improve data-analysis originating from multiple experimental runs. In the second part, we apply cyclic-Loess normalization to a Leishmania sample. This normalization method allows the removal of systematic variability between two measurement blocks over time and maintains the differential metabolites. In conclusion, normalization allows for pooling datasets from different measurement blocks over time and increases the statistical power of the analysis, hence paving the way to increase the scale of LC-MS metabolomics experiments. From our investigation, we recommend data-driven normalization methods over model-driven normalization methods, if only a few internal standards were used. Moreover, data-driven normalization methods are the best option to normalize datasets from untargeted LC-MS experiments.
Collapse
|
8
|
Duncan MW. Good mass spectrometry and its place in good science. JOURNAL OF MASS SPECTROMETRY : JMS 2012; 47:795-809. [PMID: 22707172 DOI: 10.1002/jms.3038] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The mass spectrometry community has expanded as instruments became more powerful, user-friendly, affordable and readily available. This opens up opportunities for novice users to perform high impact research, using highly advanced instrumentation. This introductory tutorial is targeted at the novice user working in a research setting. It aims to offer the benefit of other people's experiences and to help newcomers avoid known pitfalls and problematic issues. It discusses some of the essential features of sound analytical chemistry and highlights the need to use validated analytical methods that provide high quality results along with a measure of their uncertainty. Examples are used to illustrate potential pitfalls and their consequences.
Collapse
Affiliation(s)
- Mark W Duncan
- Division of Endocrinology, Metabolism and Diabetes, Department of Medicine, University of Colorado Denver-School of Medicine, Aurora, Colorado 80045, USA.
| |
Collapse
|
9
|
Bougioukos P, Glotsos D, Cavouras D, Daskalakis A, Kalatzis I, Kostopoulos S, Nikiforidis G, Bezerianos A. An intensity-region driven multi-classifier scheme for improving the classification accuracy of proteomic MS-spectra. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2010; 99:147-153. [PMID: 20004492 DOI: 10.1016/j.cmpb.2009.11.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Revised: 10/26/2009] [Accepted: 11/04/2009] [Indexed: 05/28/2023]
Abstract
In this study, a pattern recognition system is presented for improving the classification accuracy of MS-spectra by means of gathering information from different MS-spectra intensity regions using a majority vote ensemble combination. The method starts by automatically breaking down all MS-spectra into common intensity regions. Subsequently, the most informative features (m/z values), which might constitute potential significant biomarkers, are extracted from each common intensity region over all the MS-spectra and, finally, normal from ovarian cancer MS-spectra are discriminated using a multi-classifier scheme, with members the Support Vector Machine, the Probabilistic Neural Network and the k-Nearest Neighbour classifiers. Clinical material was obtained from the publicly available ovarian proteomic dataset (8-7-02). To ensure robust and reliable estimates, the proposed pattern recognition system was evaluated using an external cross-validation process. The average overall performance of the system in discriminating normal from cancer ovarian MS-spectra was 97.18% with 98.52% mean sensitivity and 94.84% mean specificity values.
Collapse
|
10
|
Amon LM, Law W, Fitzgibbon MP, Gross JA, O'Briant K, Peterson A, Drescher C, Martin DB, McIntosh M. Integrative proteomic analysis of serum and peritoneal fluids helps identify proteins that are up-regulated in serum of women with ovarian cancer. PLoS One 2010; 5:e11137. [PMID: 20559444 PMCID: PMC2886122 DOI: 10.1371/journal.pone.0011137] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2010] [Accepted: 05/26/2010] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND We used intensive modern proteomics approaches to identify predictive proteins in ovary cancer. We identify up-regulated proteins in both serum and peritoneal fluid. To evaluate the overall performance of the approach we track the behavior of 20 validated markers across these experiments. METHODOLOGY Mass spectrometry based quantitative proteomics following extensive protein fractionation was used to compare serum of women with serous ovarian cancer to healthy women and women with benign ovarian tumors. Quantitation was achieved by isotopically labeling cysteine amino acids. Label-free mass spectrometry was used to compare peritoneal fluid taken from women with serous ovarian cancer and those with benign tumors. All data were integrated and annotated based on whether the proteins have been previously validated using antibody-based assays. FINDINGS We selected 54 quantified serum proteins and 358 peritoneal fluid proteins whose case-control differences exceeded a predefined threshold. Seventeen proteins were quantified in both materials and 14 are extracellular. Of 19 validated markers that were identified all were found in cancer peritoneal fluid and a subset of 7 were quantified in serum, with one of these proteins, IGFBP1, newly validated here. CONCLUSION Proteome profiling applied to symptomatic ovarian cancer cases identifies a large number of up-regulated serum proteins, many of which are or have been confirmed by immunoassays. The number of currently known validated markers is highest in peritoneal fluid, but they make up a higher percentage of the proteins observed in both serum and peritoneal fluid, suggesting that the 10 additional markers in this group may be high quality candidates.
Collapse
Affiliation(s)
- Lynn M. Amon
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Wendy Law
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Matthew P. Fitzgibbon
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Jennifer A. Gross
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Kathy O'Briant
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Amelia Peterson
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Charles Drescher
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Daniel B. Martin
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Martin McIntosh
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| |
Collapse
|
11
|
Improved reporting of statistical design and analysis: guidelines, education, and editorial policies. Methods Mol Biol 2010; 620:563-98. [PMID: 20652522 DOI: 10.1007/978-1-60761-580-4_22] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
A majority of original articles published in biomedical journals include some form of statistical analysis. Unfortunately, many of the articles contain errors in statistical design and/or analysis. These errors are worrisome, as the misuse of statistics jeopardizes the process of scientific discovery and the accumulation of scientific knowledge. To help avoid these errors and improve statistical reporting, four approaches are suggested: (1) development of guidelines for statistical reporting that could be adopted by all journals, (2) improvement in statistics curricula in biomedical research programs with an emphasis on hands-on teaching by biostatisticians, (3) expansion and enhancement of biomedical science curricula in statistics programs, and (4) increased participation of biostatisticians in the peer review process along with the adoption of more rigorous journal editorial policies regarding statistics. In this chapter, we provide an overview of these issues with emphasis to the field of molecular biology and highlight the need for continuing efforts on all fronts.
Collapse
|
12
|
Gutstein HB, Morris JS, Annangudi SP, Sweedler JV. Microproteomics: analysis of protein diversity in small samples. MASS SPECTROMETRY REVIEWS 2008; 27:316-30. [PMID: 18271009 PMCID: PMC2743962 DOI: 10.1002/mas.20161] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Proteomics, the large-scale study of protein expression in organisms, offers the potential to evaluate global changes in protein expression and their post-translational modifications that take place in response to normal or pathological stimuli. One challenge has been the requirement for substantial amounts of tissue in order to perform comprehensive proteomic characterization. In heterogeneous tissues, such as brain, this has limited the application of proteomic methodologies. Efforts to adapt standard methods of tissue sampling, protein extraction, arraying, and identification are reviewed, with an emphasis on those appropriate to smaller samples ranging in size from several microliters down to single cells. The effects of miniaturization on these analyses are highlighted using neuroscience-related examples, as are statistical issues unique to the high-dimensional datasets generated by proteomic experiments.
Collapse
Affiliation(s)
- Howard B Gutstein
- Department of Anesthesiology, University of Texas-MD Anderson Cancer Center, 1515 Holcombe Blvd., Box 110, Houston, TX 77030-4009, USA.
| | | | | | | |
Collapse
|
13
|
Vandenbogaert M, Li-Thiao-Té S, Kaltenbach HM, Zhang R, Aittokallio T, Schwikowski B. Alignment of LC-MS images, with applications to biomarker discovery and protein identification. Proteomics 2008; 8:650-72. [PMID: 18297649 DOI: 10.1002/pmic.200700791] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
LC-MS-based approaches have gained considerable interest for the analysis of complex peptide or protein mixtures, due to their potential for full automation and high sampling rates. Advances in resolution and accuracy of modern mass spectrometers allow new analytical LC-MS-based applications, such as biomarker discovery and cross-sample protein identification. Many of these applications compare multiple LC-MS experiments, each of which can be represented as a 2-D image. In this article, we survey current approaches to LC-MS image alignment. LC-MS image alignment corrects for experimental variations in the chromatography and represents a computational key technology for the comparison of LC-MS experiments. It is a required processing step for its two major applications: biomarker discovery and protein identification. Along with descriptions of the computational analysis approaches, we discuss their relative merits and potential pitfalls.
Collapse
|
14
|
Fenyö D, Beavis RC. Informatics development: challenges and solutions for MALDI mass spectrometry. MASS SPECTROMETRY REVIEWS 2008; 27:1-19. [PMID: 17979143 DOI: 10.1002/mas.20152] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has been successfully applied to elucidating biological questions trough the analysis of proteins, peptides, and nucleic acids. Here, we review the different approaches for analyzing the data that is generated by MALDI-MS. The first step in the analysis is the processing of the raw data to find peaks that correspond to the analytes. The peaks are characterized by their areas (or heights) and their centroids. The peak area can be used as a measure of the quantity of the analyte, and the centroid can be used to determine the mass of the analyte. The masses are then compared to models of the analyte, and these models are ranked according to how well they fit the data and their significance is calculated. This allows the determination of the identity (sequence and modifications) of the analytes. We show how this general data analysis workflow is applied to protein and nucleic acid chemistry as well as proteomics.
Collapse
Affiliation(s)
- David Fenyö
- The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA.
| | | |
Collapse
|
15
|
Abstract
Proteomics holds the promise of evaluating global changes in protein expression and post-translational modification in response to environmental stimuli. However, difficulties in achieving cellular anatomic resolution and extracting specific types of proteins from cells have limited the efficacy of these techniques. Laser capture microdissection has provided a solution to the problem of anatomical resolution in tissues. New extraction methodologies have expanded the range of proteins identified in subsequent analyses. This review will examine the application of laser capture microdissection to proteomic tissue sampling, and subsequent extraction of these samples for differential expression analysis. Statistical and other quantitative issues important for the analysis of the highly complex datasets generated are also reviewed.
Collapse
Affiliation(s)
- Howard B Gutstein
- MD Anderson Cancer Center, 1515 Holcombe Blvd, Box 110, Houston, TX 77030-4009, USA.
| | | |
Collapse
|
16
|
Ransohoff DF. How to improve reliability and efficiency of research about molecular markers: roles of phases, guidelines, and study design. J Clin Epidemiol 2007; 60:1205-19. [PMID: 17998073 DOI: 10.1016/j.jclinepi.2007.04.020] [Citation(s) in RCA: 135] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2006] [Revised: 04/01/2007] [Accepted: 04/12/2007] [Indexed: 11/29/2022]
Abstract
BACKGROUND AND OBJECTIVE The search for molecular markers for cancer, using "discovery-based" techniques, has resulted in claims of a very high degree of discrimination both for cancer diagnosis (e.g., serum proteomics patterns) and prognosis (e.g., RNA expression genomic signatures). However, many promising initial results have been found to be unreliable or not reproducible, and the larger process of discovery can seem slow and inefficient. To improve the process to develop molecular markers, proposals to use "phases" and "guidelines" have been made, based on experience with the process of drug development and randomized controlled clinical trials. The objective is to help improve the reliability and efficiency of development of molecular markers for cancer diagnosis. STUDY DESIGN AND SETTING The literature was searched to identify important current problems (in serum proteomics for cancer diagnosis and RNA expression genomics for cancer prognosis) are identified, and the roles of tools ("phases," "guidelines," and "study design") to address those problems are considered. Based on lessons learned, approaches for the future are discussed, some of which may seem "radical" compared with drug development. RESULTS Phases identify and organize questions to be addressed by individual studies. Guidelines identify features of design and conduct to be reported so that each study's reliability can be judged. Study design involves the myriad details and choices involved in actual planning and conduct of a study. Study design is most important in the sense of determining whether a study is reliable or not. Studies that are unreliable, because of problems from chance and bias, constitute a major current problem leading to inflated expectations, wasted effort, and inefficiency in the larger process of development. By considering fundamental principles, it may be possible to identify approaches that are different than those used in drug development, while preserving reliability and efficiency. CONCLUSION Phases and guidelines have important roles, but issues in study design address the fundamental problems that compromise reliability and efficiency. Tools to study markers are underdeveloped and will evolve over time, perhaps to include seemingly radical approaches.
Collapse
Affiliation(s)
- David F Ransohoff
- Department of Medicine, University of North Carolina at Chapel Hill, CB# 7080, 4103 Bioinformatics Building, Chapel Hill, NC 27599-7080, USA.
| |
Collapse
|