1
|
Association of Cognitive Performance with Frailty in Older Individuals with Cognitive Complaints. J Nutr Health Aging 2022; 26:89-95. [PMID: 35067709 DOI: 10.1007/s12603-021-1712-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
OBJECTIVES Frailty is a risk factor for poor cognitive performance in older adults. However, few studies have evaluated the association of cognitive performance with frailty in a low- to middle-income country (LMIC). This study aimed to investigate an association between cognitive performance and frailty in older adults with memory complaints in Brazil. Secondarily, we aim to assess an association of cognitive performance with gait speed and grip strength. DESIGN Cross-sectional study. SETTING Outpatient service from a LMIC. PARTICIPANTS Older adults with memory complaints reported by the participants, their proxies, or their physicians. MEASUREMENTS Frailty was evaluated using the Cardiovascular Health Study criteria. A neuropsychological battery evaluated memory, attention, language, visuospatial function, executive function. Linear regression analysis with adjustment for age, sex, and education was used. We also evaluated the interaction of education with frailty, grip strength, and gait speed. RESULTS Prefrailty was associated with poor performance in the memory domain, as well as slower gait speed was associated with worse performance in memory, attention, language, and executive function. Frailty and grip strength were not associated with cognitive performance. Interactions of education with gait speed were significant for global performance, as well as for attention and visuospatial ability. CONCLUSION In elderly patients with memory complaints, prefrailty was associated with poor memory performance. Slowness was associated with poorer performance in some cognitive domains, mainly in participants with low education.
Collapse
|
2
|
[Construction of artificial neural network model for predicting the efficacy of first-line FOLFOX chemotherapy for metastatic colorectal cancer]. ZHONGHUA ZHONG LIU ZA ZHI [CHINESE JOURNAL OF ONCOLOGY] 2021; 43:202-206. [PMID: 33601485 DOI: 10.3760/cma.j.cn112152-20200419-00355] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Objective: To explore and establish an artificial neural network (ANN) model for predicting the efficacy of first-line FOLFOX chemotherapy for metastatic colorectal cancer. Methods: A set of FOLFOX chemotherapy data from a group of patients with metastatic colorectal cancer (mCRC) (GSE104645) was downloaded from the GEO database as a training set. According to the FOLFOX protocol, the efficacy was divided into two groups: the chemo-sensitive group (including complete response and partial response) and the chemo-resistant group (including stable disease and progressive disease), including 31 cases in the sensitive group and 23 in the resistant group. Then, chip data (accessible number: GSE69657) from Fujian Medical University Union Hospital were chosen as a test set. A total of 30 patients were enrolled in the study, including 13 in the sensitive group and 17 in the resistant group. The batch effect correction was performed on the expression values of the two sets of matrices using the R 3.5.1 software Combat package. The gene expression difference of sensitive and resistant group in GSE104645 was analyzed by the GEO2R platform. P<0.05 and the absolute value of log(2)FC>0.33 (FC abbreviation of fold change) were used as the threshold value to screen the drug resistance and sensitive genes of the FOLFOX regimen. An ANN was constructed using the multi-layer perceptron (MLP) to perform the FOLFOX regimen on the GSE104645 dataset. The GSE69657 expression matrix and clinical efficacy parameters were then used for retrospective verification. Receiver operating characteristic(ROC) curves were used to evaluate the test results and predictive power. Results: A total of 2, 076 differentially expressed genes in GSE104645 were selected, of which 822 genes were up-regulated and 1, 254 genes were down-regulated in the chemo-resistance group. The down-regulated genes were sensitive genes. GO analysis of the biological processes in which the differentially expressed genes were involved, revealed that they were mainly involved in the regulation of substance metabolism. A total of 39 genes were included in the final model construction. This was a neural network model with two hidden layers. The accuracy of predicting training samples and test samples was 75.7% and 76.5%, respectively, and the area under the ROC curve was 0.875. The chip data set of our department (GSE69657) was set as the test set, and the area under the ROC curve was 0.778. Conclusions: In this study, an artificial neural network model is successfully constructed to predict the efficacy of first-line FOLFOX regimen for metastatic colorectal cancer based on the microarray, and an independent external verification is also conducted. The model has good stability and well prediction efficiency. Besides, the results of this study suggest that the gene functions related to oxaliplatin resistance are mainly enriched in the regulation process of substance metabolism.
Collapse
|
3
|
Assessment of 135 794 Pediatric Patients Tested for Severe Acute Respiratory Syndrome Coronavirus 2 Across the United States. JAMA Pediatr 2021; 175:176-184. [PMID: 33226415 PMCID: PMC7684518 DOI: 10.1001/jamapediatrics.2020.5052] [Citation(s) in RCA: 150] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
IMPORTANCE There is limited information on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testing and infection among pediatric patients across the United States. OBJECTIVE To describe testing for SARS-CoV-2 and the epidemiology of infected patients. DESIGN, SETTING, AND PARTICIPANTS A retrospective cohort study was conducted using electronic health record data from 135 794 patients younger than 25 years who were tested for SARS-CoV-2 from January 1 through September 8, 2020. Data were from PEDSnet, a network of 7 US pediatric health systems, comprising 6.5 million patients primarily from 11 states. Data analysis was performed from September 8 to 24, 2020. EXPOSURE Testing for SARS-CoV-2. MAIN OUTCOMES AND MEASURES SARS-CoV-2 infection and coronavirus disease 2019 (COVID-19) illness. RESULTS A total of 135 794 pediatric patients (53% male; mean [SD] age, 8.8 [6.7] years; 3% Asian patients, 15% Black patients, 11% Hispanic patients, and 59% White patients; 290 per 10 000 population [range, 155-395 per 10 000 population across health systems]) were tested for SARS-CoV-2, and 5374 (4%) were infected with the virus (12 per 10 000 population [range, 7-16 per 10 000 population]). Compared with White patients, those of Black, Hispanic, and Asian race/ethnicity had lower rates of testing (Black: odds ratio [OR], 0.70 [95% CI, 0.68-0.72]; Hispanic: OR, 0.65 [95% CI, 0.63-0.67]; Asian: OR, 0.60 [95% CI, 0.57-0.63]); however, they were significantly more likely to have positive test results (Black: OR, 2.66 [95% CI, 2.43-2.90]; Hispanic: OR, 3.75 [95% CI, 3.39-4.15]; Asian: OR, 2.04 [95% CI, 1.69-2.48]). Older age (5-11 years: OR, 1.25 [95% CI, 1.13-1.38]; 12-17 years: OR, 1.92 [95% CI, 1.73-2.12]; 18-24 years: OR, 3.51 [95% CI, 3.11-3.97]), public payer (OR, 1.43 [95% CI, 1.31-1.57]), outpatient testing (OR, 2.13 [1.86-2.44]), and emergency department testing (OR, 3.16 [95% CI, 2.72-3.67]) were also associated with increased risk of infection. In univariate analyses, nonmalignant chronic disease was associated with lower likelihood of testing, and preexisting respiratory conditions were associated with lower risk of positive test results (standardized ratio [SR], 0.78 [95% CI, 0.73-0.84]). However, several other diagnosis groups were associated with a higher risk of positive test results: malignant disorders (SR, 1.54 [95% CI, 1.19-1.93]), cardiac disorders (SR, 1.18 [95% CI, 1.05-1.32]), endocrinologic disorders (SR, 1.52 [95% CI, 1.31-1.75]), gastrointestinal disorders (SR, 2.00 [95% CI, 1.04-1.38]), genetic disorders (SR, 1.19 [95% CI, 1.00-1.40]), hematologic disorders (SR, 1.26 [95% CI, 1.06-1.47]), musculoskeletal disorders (SR, 1.18 [95% CI, 1.07-1.30]), mental health disorders (SR, 1.20 [95% CI, 1.10-1.30]), and metabolic disorders (SR, 1.42 [95% CI, 1.24-1.61]). Among the 5374 patients with positive test results, 359 (7%) were hospitalized for respiratory, hypotensive, or COVID-19-specific illness. Of these, 99 (28%) required intensive care unit services, and 33 (9%) required mechanical ventilation. The case fatality rate was 0.2% (8 of 5374). The number of patients with a diagnosis of Kawasaki disease in early 2020 was 40% lower (259 vs 433 and 430) than in 2018 or 2019. CONCLUSIONS AND RELEVANCE In this large cohort study of US pediatric patients, SARS-CoV-2 infection rates were low, and clinical manifestations were typically mild. Black, Hispanic, and Asian race/ethnicity; adolescence and young adulthood; and nonrespiratory chronic medical conditions were associated with identified infection. Kawasaki disease diagnosis is not an effective proxy for multisystem inflammatory syndrome of childhood.
Collapse
|
4
|
[Establishment of clinical features and prognostic scoring model in early-stage hepatitis B-related acute-on-chronic liver failure]. ZHONGHUA GAN ZANG BING ZA ZHI = ZHONGHUA GANZANGBING ZAZHI = CHINESE JOURNAL OF HEPATOLOGY 2020; 28:441-445. [PMID: 32403883 DOI: 10.3760/cma.j.cn501113-20200316-00116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Objective: To explore the clinical characteristics and establish a corresponding prognostic scoring model in patients with early-stage clinical features of hepatitis B-induced acute-on-chronic liver failure (HBV-ACLF). Methods: Clinical characteristics of 725 cases with hepatitis B-related acute-on-chronic hepatic dysfunction (HBV-ACHD) were retrospectively analyzed using Chinese group on the study of severe hepatitis B (COSSH). The independent risk factors associated with 90-day prognosis to establish a prognostic scoring model was analyzed by multivariate Cox regression, and was validated by 500 internal and 390 external HBV-ACHD patients. Results: Among 725 cases with HBV-ACHD, 76.8% were male, 96.8% had cirrhosis base,66.5% had complications of ascites, 4.1% had coagulation failure in respect to organ failure, and 9.2% had 90-day mortality rate. Multivariate Cox regression analysis showed that TBil, WBC and ALP were the best predictors of 90-day mortality rate in HBV-ACHD patients. The established scoring model was COSS-HACHADs = 0.75 × ln(WBC) + 0.57 × ln(TBil)-0.94 × ln(ALP) +10. The area under the receiver operating characteristic curve (AUROC) of subjects was significantly higher than MELD, MELD-Na, CTP and CLIF-C ADs(P < 0.05). An analysis of 500 and 390 cases of internal random selection group and external group had similar verified results. Conclusion: HBV-ACHD patients are a group of people with decompensated cirrhosis combined with small number of organ failure, and the 90-day mortality rate is 9.2%. COSSH-ACHDs have a higher predictive effect on HBV-ACHD patients' 90-day prognosis, and thus provide evidence-based medicine for early clinical diagnosis and treatment.
Collapse
|
5
|
The Financial Impact of Genetic Diseases in a Pediatric Accountable Care Organization. Front Public Health 2020; 8:58. [PMID: 32181236 PMCID: PMC7059305 DOI: 10.3389/fpubh.2020.00058] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/17/2020] [Indexed: 11/13/2022] Open
Abstract
Background: Previous studies revealed patients with genetic disease have more frequent and longer hospitalizations and therefore higher healthcare costs. To understand the financial impact of genetic disease on a pediatric accountable care organization (ACO), we analyzed medical claims from 2014 provided by Partners for Kids, an ACO in partnership with Nationwide Children's Hospital (NCH; Columbus, OH, USA). Methods: Study population included insurance claims from 258,399 children. We assigned patients to four different categories (1-A, 1-B, 2, & 3) based on the strength of genetic basis of disease. Results: We identified 22.7% of patients as category 1A or 1B- having a disease with a "strong genetic basis" (e.g., single gene diseases, chromosomal abnormalities). Total ACO paid claims in 2014 were $379M, of which $161M (42.5%) was attributed to category 1 patients. Furthermore, we identified 23.3% of patients as category 2- having a disease with a suspected genetic component or predisposition (e.g., asthma, type 1 diabetes)- whom accounted for an additional 28.6% of 2014 costs. Category 1 patients were more likely to experience at least one hospitalization compared to category 3 patients- those without genetic disease [odds ratio [OR] = 4.12; 95% confidence interval [CI] = 3.86-4.39; p < 0.0001]. Overall, category 1 patients experienced nearly five times the number of inpatient (IP) admissions and twice the number of outpatient (OP) visits compared to category 3 patients (p < 0.0001). Conclusion: Nearly half (42.5%) of healthcare paid claims cost in 2014 for this study population were accounted for by patients with single-gene diseases or chromosomal abnormalities. These findings precede and support a need for an ACO to plan for effective healthcare strategies and capitation models for children with genetic disease.
Collapse
|
6
|
Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 2020; 36:1241-1251. [PMID: 31584634 PMCID: PMC7703771 DOI: 10.1093/bioinformatics/btz718] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 08/25/2019] [Accepted: 09/26/2019] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. RESULTS We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug-drug interaction (DDI) prediction, protein-protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. AVAILABILITY AND IMPLEMENTATION As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
7
|
A deep learning model for pediatric patient risk stratification. THE AMERICAN JOURNAL OF MANAGED CARE 2019; 25:e310-e315. [PMID: 31622071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
OBJECTIVES Current models for patient risk prediction rely on practitioner expertise and domain knowledge. This study presents a deep learning model-a type of machine learning that does not require human inputs-to analyze complex clinical and financial data for population risk stratification. STUDY DESIGN A comparative predictive analysis of deep learning versus other popular risk prediction modeling strategies using medical claims data from a cohort of 112,641 pediatric accountable care organization members. METHODS "Skip-Gram," an unsupervised deep learning approach that uses neural networks for prediction modeling, used data from 2014 and 2015 to predict the risk of hospitalization in 2016. The area under the curve (AUC) of the deep learning model was compared with that of both the Clinical Classifications Software and the commercial DxCG Intelligence predictive risk models, each with and without demographic and utilization features. We then calculated costs for patients in the top 1% and 5% of hospitalization risk identified by each model. RESULTS The deep learning model performed the best across 6 predictive models, with an AUC of 75.1%. The top 1% of members selected by the deep learning model had a combined healthcare cost $5 million higher than that of the group identified by the DxCG Intelligence model. CONCLUSIONS The deep learning model outperforms the traditional risk models in prospective hospitalization prediction. Thus, deep learning may improve the ability of managed care organizations to perform predictive modeling of financial risk, in addition to improving the accuracy of risk stratification for population health management activities.
Collapse
|
8
|
Abstract
The use of MALDI-TOF mass spectrometry as a means of analyzing the proteome has been evaluated extensively in recent years. One of the limitations of this technique that has impeded the development of robust data analysis algorithms is the variability in the location of protein ion signals along the x-axis. We studied technical variations of MALDI-TOF measurements in the context of proteomics profiling. By acquiring a benchmark data set with five replicates, we estimated 76% to 85% of the total variance is due to phase variation. We devised a lobster plot, so named because of the resemblance to a lobster claw, to help detect the phase variation in replicates. We also investigated a peak alignment algorithm to remove the phase variation. This operation is analogous to the normalization step in microarray data analysis. Only after this critical step can features of biological interest be clearly revealed. With the help of principal component analysis, we demonstrated that after peak alignment, the differences among replicates are reduced. We compared this approach to peak alignment with a model-based calibration approach in which there was known information about peaks in common among all spectra. Finally, we examined the potential value at each point in an analysis pipeline of having a set of methods available that includes parametric, semiparametric and nonparametric methods; among such methods are those that benefit from the use of prior information.
Collapse
|
9
|
[Solitary fibrous tumor/hemangiopericytoma of central nervous system: a clinicopathologic analysis of 71 cases]. ZHONGHUA BING LI XUE ZA ZHI = CHINESE JOURNAL OF PATHOLOGY 2017; 46:465-470. [PMID: 28728219 DOI: 10.3760/cma.j.issn.0529-5807.2017.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Objective: As solitary fibrous tumor (SFT) and hemangiopericytoma (HPC) share the same molecular genetics features, the 2016 WHO classification of central nervous system (CNS) tumors had created the combined term SFT/HPC and assigns three grades. This study aims to investigate the clinicopathologic characteristics, diagnosis, differential diagnosis and prognosis of CNS SFT/HPC. Methods: Seventy-one cases of CNS SFT and HPC were retrospectively reclassified and studied. Histopathological, immunohistochemical and imaging features were analyzed. The follow-up data were analyzed. Results: There were 37 male and 34 female patients. The median age was 48 years (range, 3-77 years). Twelve cases (17%) were WHO grade Ⅰ, 26 (37%) were WHO grade Ⅱ and 33 (46%) were WHO grade Ⅲ. Microscopically the tumor could show traditional SFT phenotype, HPC phenotype or mixed phenotype. Immunochemically, 97%(69/71) were positive for STAT6, with 96%(66/69)showing diffuse strong staining. Approximately 90% were diffusely positive for bcl-2, CD99 and vimentin. The expression rate of CD34 decreased with increasing tumor grade, and the mean expression rate was 78%. SSTR2a was variably expressed in 10% (7/71) of cases including one case showing strong cytoplasmic staining. A few cases expressed EMA, CD57 and S-100 focally. The Ki-67 index ranged from 1% to 50%. Thirty four patients were followed up for 8-130 months; 12 patients(35%)had recurrences, and two (6%) had liver metastases. Conclusions: CNS SFT/HPC is relatively uncommon. There was significant morphological overlap or transition between different grades. STAT6 is a specific marker for the diagnosis of this tumor. Surgical resection is the preferred treatment. WHO grade Ⅱ and Ⅲ SFT/HPC show rates of local recurrence and systemic metastasis, with liver being the most common site of extracranial metastasis.
Collapse
|
10
|
352 Re:fine NEISS: a real-time interaction search system for consumer product-related injury ed visits in United States. Inj Prev 2016. [DOI: 10.1136/injuryprev-2016-042156.352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
11
|
'RE:fine drugs': an interactive dashboard to access drug repurposing opportunities. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw083. [PMID: 27189611 PMCID: PMC4869799 DOI: 10.1093/database/baw083] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 04/26/2016] [Indexed: 12/01/2022]
Abstract
The process of discovering new drugs has been extremely costly and slow in the last decades despite enormous investment in pharmaceutical research. Drug repurposing enables researchers to speed up the process of discovering other conditions that existing drugs can effectively treat, with low cost and fast FDA approval. Here, we introduce ‘RE:fine Drugs’, a freely available interactive website for integrated search and discovery of drug repurposing candidates from GWAS and PheWAS repurposing datasets constructed using previously reported methods in Nature Biotechnology. ‘RE:fine Drugs’ demonstrates the possibilities to identify and prioritize novelty of candidates for drug repurposing based on the theory of transitive Drug–Gene–Disease triads. This public website provides a starting point for research, industry, clinical and regulatory communities to accelerate the investigation and validation of new therapeutic use of old drugs. Database URL:http://drug-repurposing.nationwidechildrens.org
Collapse
|
12
|
MD-CTS: An integrated terminology reference of clinical and translational medicine. Comput Struct Biotechnol J 2016; 14:131-4. [PMID: 27069559 PMCID: PMC4810012 DOI: 10.1016/j.csbj.2016.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Revised: 02/22/2016] [Accepted: 02/22/2016] [Indexed: 11/19/2022] Open
Abstract
New vocabularies are rapidly evolving in the literature relative to the practice of clinical medicine and translational research. To provide integrated access to new terms, we developed a mobile and desktop online reference—Marshfield Dictionary of Clinical and Translational Science (MD-CTS). It is the first public resource that comprehensively integrates Wiktionary (word definition), BioPortal (ontology), Wiki (image reference), and Medline abstract (word usage) information. MD-CTS is accessible at http://spellchecker.mfldclin.edu/. The website provides a broadened capacity for the wider clinical and translational science community to keep pace with newly emerging scientific vocabulary. An initial evaluation using 63 randomly selected biomedical words suggests that online references generally provided better coverage (73%-95%) than paper-based dictionaries (57–71%).
Collapse
|
13
|
A Review on Genomics APIs. Comput Struct Biotechnol J 2015; 14:8-15. [PMID: 26702340 PMCID: PMC4669666 DOI: 10.1016/j.csbj.2015.10.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Revised: 10/15/2015] [Accepted: 10/23/2015] [Indexed: 11/17/2022] Open
Abstract
The constant improvement and falling prices of whole human genome Next Generation Sequencing (NGS) has resulted in rapid adoption of genomic information at both clinics and research institutions. Considered together, the complexity of genomics data, due to its large volume and diversity along with the need for genomic data sharing, has resulted in the creation of Application Programming Interface (API) for secure, modular, interoperable access to genomic data from different applications, platforms, and even organizations. The Genomics APIs are a set of special protocols that assist software developers in dealing with multiple genomic data sources for building seamless, interoperable applications leading to the advancement of both genomic and clinical research. These APIs help define a standard for retrieval of genomic data from multiple sources as well as to better package genomic information for integration with Electronic Health Records. This review covers three currently available Genomics APIs: a) Google Genomics, b) SMART Genomics, and c) 23andMe. The functionalities, reference implementations (if available) and authentication protocols of each API are reviewed. A comparative analysis of the different features across the three APIs is provided in the Discussion section. Though Genomics APIs are still under active development and have yet to reach widespread adoption, they hold the promise to make building of complicated genomics applications easier with downstream constructive effects on healthcare.
Collapse
|
14
|
Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol 2015; 16:133. [PMID: 26109056 PMCID: PMC4506430 DOI: 10.1186/s13059-015-0694-1] [Citation(s) in RCA: 241] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 06/12/2015] [Indexed: 12/22/2022] Open
Abstract
Background Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. Results We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. Conclusions We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0694-1) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Abstract
Background Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Methods Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). Results We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. Conclusions These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.
Collapse
|
16
|
|
17
|
Association of cyclin D1 and survivin expression with sensitivity to radiotherapy in patients with nasopharyngeal carcinoma. GENETICS AND MOLECULAR RESEARCH 2014; 13:3502-9. [PMID: 24615109 DOI: 10.4238/2014.february.14.6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
The association between cyclin D1 and survivin protein expressions with radiotherapy sensitivity in patients with nasopharyngeal carcinoma was investigated. Biopsy specimens of 72 patients with nasopharyngeal carcinoma were collected before the initiation of radiotherapy (49 cases were in the radiation-sensitive group and 23 cases were in the radiation-insensitive group). Conventional hematoxylin and eosin staining was used for tissue typing. The immunohistochemical SP method was used to detect cyclin D1 and survivin protein expression levels. The IBM SPSS Statistics 20 statistical software was applied for conducting the chi-squared test and the Spearman correlation analysis. In the 72 cases, the high expression rates of cyclin D1 were 28.6% (14/49) and 69.6% (16/23) in the radiotherapy-sensitive group and in the radiotherapy-insensitive group, respectively, and the differences between groups were statistically significant (P<0.05). The high expression rates of survivin were 34.7% (17/49) and 73.9% (17/23) in the radiotherapy-sensitive group and in the radiotherapy-insensitive group, respectively, which differed significantly (P<0.05). The protein expressions of cyclin D1 and survivin were positively correlated (Spearman's r=0.353, P<0.05). Cyclin D1 and survivin expression levels were negatively correlated with the radiosensitivity of nasopharyngeal carcinoma. Cyclin D1 and survivin may be used as molecular markers to predict the sensitivity of radiotherapy.
Collapse
|
18
|
Exploring the FDA adverse event reporting system to generate hypotheses for monitoring of disease characteristics. Clin Pharmacol Ther 2014; 95:496-8. [PMID: 24448476 DOI: 10.1038/clpt.2014.17] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 01/14/2014] [Indexed: 11/09/2022]
Abstract
The US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) is a database for postmarketing drug safety monitoring and influences changes in FDA safety guidance documents such as drug labels. The number of cases in the FAERS has rapidly increased with the improvement of submission methods and data standards and thus has become an important resource for regulatory science. Although the FAERS has been predominantly used for safety signal detection, this study explored its utility for disease characteristics.
Collapse
|
19
|
Abstract
mzXML (extensible markup language) is one of the pioneering data formats for mass spectrometry-based proteomics data collection. It is an open data format that has benefited and evolved as a result of the input of many groups, and it continues to evolve. Due to its dynamic history, its structure, purpose and applicability have all changed with time, meaning that groups that have looked at the standard at different points during its evolution have differing impressions of the usefulness of mzXML. In discussing mzXML, it is important to understand what mzXML is not. First, mzXML does not capture the raw data. Second, mzXML is not sufficient for regulatory submission. Third, mzXML is not optimized for computation and, finally, mzXML does not capture the experiment design. In general, it is the authors' opinion that XML is not a panacea for bioinformatics or a substitute for good data representation, and groups that want to use mzXML (or other XML-based representations) directly for data storage or computation will encounter performance and scalability problems. With these limitations in mind, the authors conclude that mzXML is, nonetheless, an indispensable data exchange format for proteomics.
Collapse
|
20
|
Some experiences and opportunities for big data in translational research. Genet Med 2013; 15:802-9. [PMID: 24008998 PMCID: PMC3906918 DOI: 10.1038/gim.2013.121] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 07/09/2013] [Indexed: 01/25/2023] Open
Abstract
Health care has become increasingly information intensive. The advent of genomic data, integrated into patient care, significantly accelerates the complexity and amount of clinical data. Translational research in the present day increasingly embraces new biomedical discovery in this data-intensive world, thus entering the domain of "big data." The Electronic Medical Records and Genomics consortium has taught us many lessons, while simultaneously advances in commodity computing methods enable the academic community to affordably manage and process big data. Although great promise can emerge from the adoption of big data methods and philosophy, the heterogeneity and complexity of clinical data, in particular, pose additional challenges for big data inferencing and clinical application. However, the ultimate comparability and consistency of heterogeneous clinical information sources can be enhanced by existing and emerging data standards, which promise to bring order to clinical data chaos. Meaningful Use data standards in particular have already simplified the task of identifying clinical phenotyping patterns in electronic health records.
Collapse
|
21
|
Sensorial differences according to sex and ages. Oral Dis 2013; 20:e103-10. [DOI: 10.1111/odi.12145] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Revised: 04/11/2013] [Accepted: 05/29/2013] [Indexed: 01/21/2023]
|
22
|
Abstract
Disease and Gene Annotations database (DGA, http://dga.nubic.northwestern.edu) is a collaborative effort aiming to provide a comprehensive and integrative annotation of the human genes in disease network context by integrating computable controlled vocabulary of the Disease Ontology (DO version 3 revision 2510, which has 8043 inherited, developmental and acquired human diseases), NCBI Gene Reference Into Function (GeneRIF) and molecular interaction network (MIN). DGA integrates these resources together using semantic mappings to build an integrative set of disease-to-gene and gene-to-gene relationships with excellent coverage based on current knowledge. DGA is kept current by periodically reparsing DO, GeneRIF, and MINs. DGA provides a user-friendly and interactive web interface system enabling users to efficiently query, download and visualize the DO tree structure and annotations as a tree, a network graph or a tabular list. To facilitate integrative analysis, DGA provides a web service Application Programming Interface for integration with external analytic tools.
Collapse
|
23
|
Opportunities in systems biology to discover mechanisms and repurpose drugs for CNS diseases. Drug Discov Today 2012; 17:1208-16. [DOI: 10.1016/j.drudis.2012.06.015] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Revised: 06/04/2012] [Accepted: 06/25/2012] [Indexed: 01/07/2023]
|
24
|
A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads. PLoS One 2012; 7:e46450. [PMID: 23049702 PMCID: PMC3462201 DOI: 10.1371/journal.pone.0046450] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 08/30/2012] [Indexed: 11/19/2022] Open
Abstract
The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.
Collapse
|
25
|
Technical reproducibility of genotyping SNP arrays used in genome-wide association studies. PLoS One 2012; 7:e44483. [PMID: 22970228 PMCID: PMC3436888 DOI: 10.1371/journal.pone.0044483] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2012] [Accepted: 08/08/2012] [Indexed: 01/25/2023] Open
Abstract
During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.
Collapse
|
26
|
Association of breast cancer in men with exposure to 5-α reductase inhibitors: A RADAR report. J Clin Oncol 2012. [DOI: 10.1200/jco.2012.30.15_suppl.2532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
2532 Background: Breast cancers in men (BCM) account for <1% of all breast cancers. Dihydrotestosterone (DHT) inhibits proliferation of normal and neoplastic mammary tissue and constrains the effect of estrogens. Finasteride (F) and dutasteride (D) are 5-α reductase inhibitors (5-αRIs) that reduce systemic and local dihydrotestosterone and cause gynecomastia in 1–3% of men. The package inserts for F and D state, “the relationship between long-term use of (finasteride/dutasteride) and male breast neoplasia is currently unknown.” F and D are marketed for treatment of symptomatic benign-prostatic hyperplasia. F is marketed for treatment of androgenetic alopecia. Methods: To detect disproportionality in the FDA MedWatch dataset, we calculated the empiric Bayes geometric mean (EBGM) for association of BCM with F or D. We also calculated the attributable risk of BCM exposed to F or D among men at an urban academic hospital (Northwestern Memorial Hospital) and at a rural healthcare system (Marshfield Clinic). Results: In the MedWatch dataset, we identified 33 reports of F-associated BCM and 5 reports of D-associated BCM. For F–associated BCM, the EBGM was 58.95 (95% CI 24.47-81.76; p=0.0001). For D-associated BCM, the EBGM was 15.79 (95% CI 4.57-35.49; p=0.0001). The mean age for BCM after 5-αRI exposure was 70±11 years; 11/38 (29%) had gynecomastia. There were 38 cases of BCM associated with 5-αRI in the combined Northwestern and Marshfield cohort (see table below). Conclusions: We found a highly significant association between BCM and 5-αRI exposure in each of 6 separate analyses (3 sources X 2 drugs), with an estimated 1 extra BCM per 564 men exposed to 5-αRIs. We now plan to assess BRCA status and other risk factors. Given that 5-αRIs are marketed for control of lower urinary tract symptoms or for cosmetic purposes, it is not immediately obvious that use of finasteride or dutasteride for their labeled indications would provide any net benefit. [Table: see text]
Collapse
|
27
|
Genome-wide DNA methylation indicates silencing of tumor suppressor genes in uterine leiomyoma. PLoS One 2012; 7:e33284. [PMID: 22428009 PMCID: PMC3302826 DOI: 10.1371/journal.pone.0033284] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 02/10/2012] [Indexed: 02/04/2023] Open
Abstract
Background Uterine leiomyomas, or fibroids, represent the most common benign tumor of the female reproductive tract. Fibroids become symptomatic in 30% of all women and up to 70% of African American women of reproductive age. Epigenetic dysregulation of individual genes has been demonstrated in leiomyoma cells; however, the in vivo genome-wide distribution of such epigenetic abnormalities remains unknown. Principal Findings We characterized and compared genome-wide DNA methylation and mRNA expression profiles in uterine leiomyoma and matched adjacent normal myometrial tissues from 18 African American women. We found 55 genes with differential promoter methylation and concominant differences in mRNA expression in uterine leiomyoma versus normal myometrium. Eighty percent of the identified genes showed an inverse relationship between DNA methylation status and mRNA expression in uterine leiomyoma tissues, and the majority of genes (62%) displayed hypermethylation associated with gene silencing. We selected three genes, the known tumor suppressors KLF11, DLEC1, and KRT19 and verified promoter hypermethylation, mRNA repression and protein expression using bisulfite sequencing, real-time PCR and western blot. Incubation of primary leiomyoma smooth muscle cells with a DNA methyltransferase inhibitor restored KLF11, DLEC1 and KRT19 mRNA levels. Conclusions These results suggest a possible functional role of promoter DNA methylation-mediated gene silencing in the pathogenesis of uterine leiomyoma in African American women.
Collapse
|
28
|
Mining the Gene Wiki for functional genomic knowledge. BMC Genomics 2011; 12:603. [PMID: 22165947 PMCID: PMC3271090 DOI: 10.1186/1471-2164-12-603] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Accepted: 12/13/2011] [Indexed: 11/26/2022] Open
Abstract
Background Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology. Results Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses. Conclusions The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.
Collapse
|
29
|
An early decrease in serum HBeAg titre is a strong predictor of virological response to entecavir in HBeAg-positive patients. J Viral Hepat 2011; 18:e184-90. [PMID: 21692931 DOI: 10.1111/j.1365-2893.2010.01423.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Quantification of HBeAg levels has been found to be useful in monitoring and predicting the outcomes of interferon and lamivudine treatment in HBeAg-positive patients. The aim of this study was to determine whether quantification of HBeAg at baseline and on treatment could predict which patients would achieve HBeAg seroconversion after 96 weeks of entecavir therapy. Sixty-five HBeAg-positive naïve chronic hepatitis B patients who were treated with entecavir at a dose of 0.5 mg once daily for 96 weeks were evaluated. Serum HBV DNA levels were assessed at baseline, week 24, 48 and 96; serum HBeAg levels were assessed at baseline, week 12, 24, 48, 72 and 96. Serum HBeAg levels were associated with a higher likelihood of HBeAg seroconversion to entecavir at weeks 96 than serum HBV DNA levels both at baseline and on treatment (at baseline: OR = 9.932, P = 0.003 vs. OR = 5.045, P = 0.036; on treatment: OR = 112.5, P < 0.0001 vs. OR = 47.782, P < 0.0001). A maintained reduction in HBeAg > 65% of pretreatment HBeAg values after 24 weeks of entecavir therapy is the strongest predictor for HBeAg seroconversion at week 96 (OR = 70.578, P < 0.0001). Quantification of HBeAg at the start and early during therapy showed a higher predictive value than that of HBV DNA for HBeAg seroconversion by entecavir. A significant decrease in serum HBeAg levels at week 24 may be a useful on-treatment measurement in the early phase for predicting HBeAg seroconversion and identifying patients who will most likely benefit from finite entecavir treatment.
Collapse
|
30
|
Abstract
Fibroproliferative scars are an important clinical problem, and yet the mechanisms that regulate scar formation remain poorly understood. This study explored the hypothesis that the epithelium has a critical role in dictating scar formation, and that these interactions differ in skin and mucosa. Paired skin and vaginal mucosal wounds on New Zealand white (NZW) rabbits diverged significantly; the cutaneous epithelium exhibited a greater and prolonged response to injury when compared with the mucosa. Microarray analysis of the injured epithelium was performed, and numerous factors were identified that were more strongly upregulated in skin, including several proinflammatory cytokines and profibrotic growth factors. Analysis of the underlying mesenchymal tissue demonstrated a fibrotic response in the dermis of the skin but not the mucosal lamina propria, in the absence of a connective tissue injury. To determine if the proinflammatory factors produced by the epidermis may have a role in dermal fibrosis, an IL-1 receptor antagonist was administered locally to healing skin wounds. In the NZW rabbit model, blockade of IL-1 signaling was effective in preventing hypertrophic scar formation. These results support the idea that soluble factors produced by the epithelium in response to injury may influence fibroblast behavior and regulate scar formation in vivo.
Collapse
|
31
|
Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 2010; 11:587. [PMID: 21118553 PMCID: PMC3012676 DOI: 10.1186/1471-2105-11-587] [Citation(s) in RCA: 1332] [Impact Index Per Article: 95.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2010] [Accepted: 11/30/2010] [Indexed: 01/03/2023] Open
Abstract
Background High-throughput profiling of DNA methylation status of CpG islands is crucial to understand the epigenetic regulation of genes. The microarray-based Infinium methylation assay by Illumina is one platform for low-cost high-throughput methylation profiling. Both Beta-value and M-value statistics have been used as metrics to measure methylation levels. However, there are no detailed studies of their relations and their strengths and limitations. Results We demonstrate that the relationship between the Beta-value and M-value methods is a Logit transformation, and show that the Beta-value method has severe heteroscedasticity for highly methylated or unmethylated CpG sites. In order to evaluate the performance of the Beta-value and M-value methods for identifying differentially methylated CpG sites, we designed a methylation titration experiment. The evaluation results show that the M-value method provides much better performance in terms of Detection Rate (DR) and True Positive Rate (TPR) for both highly methylated and unmethylated CpG sites. Imposing a minimum threshold of difference can improve the performance of the M-value method but not the Beta-value method. We also provide guidance for how to select the threshold of methylation differences. Conclusions The Beta-value has a more intuitive biological interpretation, but the M-value is more statistically valid for the differential analysis of methylation levels. Therefore, we recommend using the M-value method for conducting differential methylation analysis and including the Beta-value statistics when reporting the results to investigators.
Collapse
|
32
|
Assessing sources of inconsistencies in genotypes and their effects on genome-wide association studies with HapMap samples. THE PHARMACOGENOMICS JOURNAL 2010; 10:364-74. [PMID: 20368714 PMCID: PMC2928027 DOI: 10.1038/tpj.2010.24] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2009] [Accepted: 02/15/2010] [Indexed: 01/05/2023]
Abstract
The discordance in results of independent genome-wide association studies (GWAS) indicates the potential for Type I and Type II errors. We assessed the repeatibility of current Affymetrix technologies that support GWAS. Reasonable reproducibility was observed for both raw intensity and the genotypes/copy number variants. We also assessed consistencies between different SNP arrays and between genotype calling algorithms. We observed that the inconsistency in genotypes was generally small at the specimen level. To further examine whether the differences from genotyping and genotype calling are possible sources of variation in GWAS results, an association analysis was applied to compare the associated SNPs. We observed that the inconsistency in genotypes not only propagated to the association analysis, but was amplified in the associated SNPs. Our studies show that inconsistencies between SNP arrays and between genotype calling algorithms are potential sources for the lack of reproducibility in GWAS results.
Collapse
|
33
|
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010; 11:237. [PMID: 20459804 PMCID: PMC3098059 DOI: 10.1186/1471-2105-11-237] [Citation(s) in RCA: 711] [Impact Index Per Article: 50.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 05/11/2010] [Indexed: 12/04/2022] Open
Abstract
Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database.
Collapse
|
34
|
A collection of bioconductor methods to visualize gene-list annotations. BMC Res Notes 2010; 3:10. [PMID: 20180973 PMCID: PMC2829581 DOI: 10.1186/1756-0500-3-10] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Accepted: 01/19/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes. FINDINGS We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists. CONCLUSIONS These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.
Collapse
|
35
|
From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations. Bioinformatics 2009; 25:i63-8. [PMID: 19478018 PMCID: PMC2687947 DOI: 10.1093/bioinformatics/btp193] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Subjective methods have been reported to adapt a general-purpose ontology for a specific application. For example, Gene Ontology (GO) Slim was created from GO to generate a highly aggregated report of the human-genome annotation. We propose statistical methods to adapt the general purpose, OBO Foundry Disease Ontology (DO) for the identification of gene-disease associations. Thus, we need a simplified definition of disease categories derived from implicated genes. On the basis of the assumption that the DO terms having similar associated genes are closely related, we group the DO terms based on the similarity of gene-to-DO mapping profiles. Two types of binary distance metrics are defined to measure the overall and subset similarity between DO terms. A compactness-scalable fuzzy clustering method is then applied to group similar DO terms. To reduce false clustering, the semantic similarities between DO terms are also used to constrain clustering results. As such, the DO terms are aggregated and the redundant DO terms are largely removed. Using these methods, we constructed a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite). We demonstrated that DOLite results in more interpretable results than DO for gene-disease association tests. The resultant DOLite has been used in the Functional Disease Ontology (FunDO) Web application at http://www.projects.bioinformatics.northwestern.edu/fundo. Contact:s-lin2@northwestern.edu
Collapse
|
36
|
Abstract
Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.
Collapse
|
37
|
BALB/c mice genetically susceptible to proteoglycan-induced arthritis and spondylitis show colony-dependent differences in disease penetrance. Arthritis Res Ther 2009; 11:R21. [PMID: 19220900 PMCID: PMC2688253 DOI: 10.1186/ar2613] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Revised: 01/31/2009] [Accepted: 02/16/2009] [Indexed: 02/06/2023] Open
Abstract
Introduction The major histocompatibility complex (H-2d) and non-major histocompatibility complex genetic backgrounds make the BALB/c strain highly susceptible to inflammatory arthritis and spondylitis. Although different BALB/c colonies develop proteoglycan-induced arthritis and proteoglycan-induced spondylitis in response to immunization with human cartilage proteoglycan, they show significant differences in disease penetrance despite being maintained by the same vendor at either the same or a different location. Methods BALB/c female mice (24 to 26 weeks old after 4 weeks of acclimatization) were immunized with a suboptimal dose of cartilage proteoglycan to explore even minute differences among 11 subcolonies purchased from five different vendors. In vitro-measured T-cell responses, and serum cytokines and (auto)antibodies were correlated with arthritis (and spondylitis) phenotypic scores. cDNA microarrays were also performed using spleen cells of naïve and immunized BALB/cJ and BALB/cByJ mice (both colonies from The Jackson Laboratory, Bar Harbor, ME, USA), which represent the two major BALB/c sublines. Results The 11 BALB/c colonies could be separated into high (n = 3), average (n = 6), and low (n = 2) responder groups based upon their arthritis scores. While the clinical phenotypes showed significant differences, only a few immune parameters correlated with clinical or histopathological abnormalities, and seemingly none of them affected differences found in altered clinical phenotypes (onset time, severity or incidence of arthritis, or severity and progression of spondylitis). Affymetrix assay (Affymetrix, Santa Clara, CA, USA) explored 77 differentially expressed genes (at a significant level, P < 0.05) between The Jackson Laboratory's BALB/cJ (original) and BALB/cByJ (transferred from the National Institutes of Health, Bethesda, MD, USA). Fourteen of the 77 differentially expressed genes had unknown function; 24 of 77 genes showed over twofold differences, and only 8 genes were induced by immunization, some in both colonies. Conclusions Using different subcolonies of the BALB/c strain, we can detect significant differences in arthritis phenotypes, single-nucleotide polymorphisms (SNPs), and a large number of differentially expressed genes, even in non-immunized animals. A number of the known genes (and SNPs) are associated with immune responses and/or arthritis in this genetically arthritis-prone murine strain, and a number of genes of as-yet-unknown function may affect or modify clinical phenotypes of arthritis and/or spondylitis.
Collapse
|
38
|
Gene expression and functional studies of the optic nerve head astrocyte transcriptome from normal African Americans and Caucasian Americans donors. PLoS One 2008; 3:e2847. [PMID: 18716680 PMCID: PMC2518525 DOI: 10.1371/journal.pone.0002847] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2008] [Accepted: 07/07/2008] [Indexed: 11/19/2022] Open
Abstract
Purpose To determine whether optic nerve head (ONH) astrocytes, a key cellular component of glaucomatous neuropathy, exhibit differential gene expression in primary cultures of astrocytes from normal African American (AA) donors compared to astrocytes from normal Caucasian American (CA) donors. Methods We used oligonucleotide Affymetrix microarray (HG U133A & HG U133A 2.0 chips) to compare gene expression levels in cultured ONH astrocytes from twelve CA and twelve AA normal age matched donor eyes. Chips were normalized with Robust Microarray Analysis (RMA) in R using Bioconductor. Significant differential gene expression levels were detected using mixed effects modeling and Statistical Analysis of Microarray (SAM). Functional analysis and Gene Ontology were used to classify differentially expressed genes. Differential gene expression was validated by quantitative real time RT-PCR. Protein levels were detected by Western blots and ELISA. Cell adhesion and migration assays tested physiological responses. Glutathione (GSH) assay detected levels of intracellular GSH. Results Multiple analyses selected 87 genes differentially expressed between normal AA and CA (P<0.01). The most relevant genes expressed in AA were categorized by function, including: signal transduction, response to stress, ECM genes, migration and cell adhesion. Conclusions These data show that normal astrocytes from AA and CA normal donors display distinct expression profiles that impact astrocyte functions in the ONH. Our data suggests that differences in gene expression in ONH astrocytes may be specific to the development and/or progression of glaucoma in AA.
Collapse
|
39
|
Susceptibility to glaucoma: differential comparison of the astrocyte transcriptome from glaucomatous African American and Caucasian American donors. Genome Biol 2008; 9:R111. [PMID: 18613964 PMCID: PMC2530868 DOI: 10.1186/gb-2008-9-7-r111] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2008] [Revised: 06/18/2008] [Accepted: 07/09/2008] [Indexed: 12/23/2022] Open
Abstract
Comparison of gene expression in normal and glaucomatous eyes from Caucasian American and African American donors reveals differences that might reflect different susceptibility to glaucoma. Background Epidemiological and genetic studies indicate that ethnic/genetic background plays an important role in susceptibility to primary open angle glaucoma (POAG). POAG is more prevalent among the African-descent population compared to the Caucasian population. Damage in POAG occurs at the level of the optic nerve head (ONH) and is mediated by astrocytes. Here we investigated differences in gene expression in primary cultures of ONH astrocytes obtained from age-matched normal and glaucomatous donors of Caucasian American (CA) and African American (AA) populations using oligonucleotide microarrays. Results Gene expression data were obtained from cultured astrocytes representing 12 normal CA and 12 normal AA eyes, 6 AA eyes with POAG and 8 CA eyes with POAG. Data were normalized and significant differential gene expression levels detected by using empirical Bayesian shrinkage moderated t-statistics. Gene Ontology analysis and networks of interacting proteins were constructed using the BioGRID database. Network maps included regulation of myosin, actin, and protein trafficking. Real-time RT-PCR, western blots, ELISA, and functional assays validated genes in the networks. Conclusion Cultured AA and CA glaucomatous astrocytes retain differential expression of genes that promote cell motility and migration, regulate cell adhesion, and are associated with structural tissue changes that collectively contribute to neural degeneration. Key upregulated genes include those encoding myosin light chain kinase (MYLK), transforming growth factor-β receptor 2 (TGFBR2), rho-family GTPase-2 (RAC2), and versican (VCAN). These genes along with other differentially expressed components of integrated networks may reflect functional susceptibility to chronic elevated intraocular pressure that is enhanced in the optic nerve head of African Americans.
Collapse
|
40
|
Abstract
UNLABELLED Illumina microarray is becoming a popular microarray platform. The BeadArray technology from Illumina makes its preprocessing and quality control different from other microarray technologies. Unfortunately, most other analyses have not taken advantage of the unique properties of the BeadArray system, and have just incorporated preprocessing methods originally designed for Affymetrix microarrays. lumi is a Bioconductor package especially designed to process the Illumina microarray data. It includes data input, quality control, variance stabilization, normalization and gene annotation portions. In specific, the lumi package includes a variance-stabilizing transformation (VST) algorithm that takes advantage of the technical replicates available on every Illumina microarray. Different normalization method options and multiple quality control plots are provided in the package. To better annotate the Illumina data, a vendor independent nucleotide universal identifier (nuID) was devised to identify the probes of Illumina microarray. The nuID annotation packages and output of lumi processed results can be easily integrated with other Bioconductor packages to construct a statistical data analysis pipeline for Illumina data. AVAILABILITY The lumi Bioconductor package, www.bioconductor.org
Collapse
|
41
|
Abstract
Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org.
Collapse
|
42
|
A divide-and-conquer strategy to solve the out-of-memory problem of processing thousands of Affymetrix microarrays. INTERNATIONAL JOURNAL OF COMPUTATIONAL BIOLOGY AND DRUG DESIGN 2008; 1:396-405. [PMID: 20063464 DOI: 10.1504/ijcbdd.2008.022209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Out-of-memory problem was frequently encountered when processing thousands of CEL files using Bioconductor. We propose a divide-and-conquer strategy combined with randomised resampling to solve this problem. The CAMDA 2007 META-analysis data set which contains 5896 CEL files was used to test the approach on a typical commodity computer cluster by running established pre-processing algorithms for Affymetrix arrays in the Bioconductor package. The results were validated against a golden standard obtained by using a supercomputer. In addition to the performance improvement, the general divide-and-conquer strategy can be applied to any other normalisation algorithms without modifying the underlying implementation.
Collapse
|
43
|
2009 and beyond: the decade of personalised medicine. INTERNATIONAL JOURNAL OF COMPUTATIONAL BIOLOGY AND DRUG DESIGN 2008; 1:329-333. [PMID: 20063461 DOI: 10.1504/ijcbdd.2008.022205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
With a better understanding of the human genome, it is possible to tailor the medical care, including prevention, diagnosis and treatment of disease, to an individual's needs. This revolution of medical care is called Personalised Medicine. This paper provides a brief review of recent progresses in high-throughput measurement technologies, their effects on personalise medicine, milestone projects in human genome researches and related challenges in bioinformatics and computational biology.
Collapse
|
44
|
|
45
|
nuID: a universal naming scheme of oligonucleotides for illumina, affymetrix, and other microarrays. Biol Direct 2007; 2:16. [PMID: 17540033 PMCID: PMC1891274 DOI: 10.1186/1745-6150-2-16] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2007] [Accepted: 05/31/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe. RESULTS We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nuID, which is a unique, non-degenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nuID. We demonstrate the utility of nuIDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers. REVIEWERS This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman).
Collapse
|
46
|
Characterising phase variations in MALDI-TOF data and correcting them by peak alignment. Cancer Inform 2007; 1:32-40. [PMID: 19305630 PMCID: PMC2657651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The use of MALDI-TOF mass spectrometry as a means of analyzing the proteome has been evaluated extensively in recent years. One of the limitations of this technique that has impeded the development of robust data analysis algorithms is the variability in the location of protein ion signals along the x-axis. We studied technical variations of MALDI-TOF measurements in the context of proteomics profiling. By acquiring a benchmark data set with five replicates, we estimated 76% to 85% of the total variance is due to phase variation. We devised a lobster plot, so named because of the resemblance to a lobster claw, to help detect the phase variation in replicates. We also investigated a peak alignment algorithm to remove the phase variation. This operation is analogous to the normalization step in microarray data analysis. Only after this critical step can features of biological interest be clearly revealed. With the help of principal component analysis, we demonstrated that after peak alignment, the differences among replicates are reduced. We compared this approach to peak alignment with a model-based calibration approach in which there was known information about peaks in common among all spectra. Finally, we examined the potential value at each point in an analysis pipeline of having a set of methods available that includes parametric, semiparametric and nonparametric methods; among such methods are those that benefit from the use of prior information.
Collapse
|
47
|
Abstract
Methods are described to take a list of genes generated from a microarray experiment and interpret these results using various tools and ontologies. A workflow is described that details how to convert gene identifiers with SOURCE and MatchMiner and then use these converted gene lists to search the gene ontology (GO) and the medical subject headings (MeSH) ontology. Examples of searching GO with DAVID, EASE, and GOMiner are provided along with an interpretation of results. The mining of MeSH using high-density array pattern interpreter with a set of gene identifiers is also described.
Collapse
|
48
|
Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 2006; 22:2059-65. [PMID: 16820428 DOI: 10.1093/bioinformatics/btl355] [Citation(s) in RCA: 313] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION A major problem for current peak detection algorithms is that noise in mass spectrometry (MS) spectra gives rise to a high rate of false positives. The false positive rate is especially problematic in detecting peaks with low amplitudes. Usually, various baseline correction algorithms and smoothing methods are applied before attempting peak detection. This approach is very sensitive to the amount of smoothing and aggressiveness of the baseline correction, which contribute to making peak detection results inconsistent between runs, instrumentation and analysis methods. RESULTS Most peak detection algorithms simply identify peaks based on amplitude, ignoring the additional information present in the shape of the peaks in a spectrum. In our experience, 'true' peaks have characteristic shapes, and providing a shape-matching function that provides a 'goodness of fit' coefficient should provide a more robust peak identification method. Based on these observations, a continuous wavelet transform (CWT)-based peak detection algorithm has been devised that identifies peaks with different scales and amplitudes. By transforming the spectrum into wavelet space, the pattern-matching problem is simplified and in addition provides a powerful technique for identifying and separating the signal from the spike noise and colored noise. This transformation, with the additional information provided by the 2D CWT coefficients can greatly enhance the effective signal-to-noise ratio. Furthermore, with this technique no baseline removal or peak smoothing preprocessing steps are required before peak detection, and this improves the robustness of peak detection under a variety of conditions. The algorithm was evaluated with SELDI-TOF spectra with known polypeptide positions. Comparisons with two other popular algorithms were performed. The results show the CWT-based algorithm can identify both strong and weak peaks while keeping false positive rate low. AVAILABILITY The algorithm is implemented in R and will be included as an open source module in the Bioconductor project.
Collapse
|
49
|
Improved prediction of treatment response using microarrays and existing biological knowledge. Pharmacogenomics 2006; 7:495-501. [PMID: 16610959 DOI: 10.2217/14622416.7.3.495] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A desired application for microarrays in the clinic is to predict treatment response from an often diverse patient population. We present a method for analyzing microarray data that is predicated on biological pathway and function knowledge as opposed to a purely data-driven initial analysis. From an analysis perspective, this methodology takes advantage of information that is available across genes on a single array, as well as differences in those patterns across measurements. By using biological knowledge in the initial analysis, the accuracy and robustness of microarray profile classification is enhanced, especially when low numbers of samples are available. For clinical studies, particularly Phase I or I/II studies, this technique is exceptionally advantageous.
Collapse
|
50
|
Association of Single Nucleotide Polymorphisms of the Insulin Gene with Chicken Early Growth and Fat Deposition. Poult Sci 2006; 85:980-5. [PMID: 16776465 DOI: 10.1093/ps/85.6.980] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Growth rate, body composition, and fat deposition are important traits in chickens. Insulin plays important roles in hepatic cells, muscle cells, and adipose tissue cells. The purpose of the present study was to analyze association of the insulin (INS) gene with chicken growth and body composition traits. Using a F2 design resource population constructed with the crossing of Chinese native Xinghua chickens and White Recessive Rock chickens, the association of 4 single nucleotide polymorphisms (SNP; A+428G, C+1549T, T+3737C, and A+3971G) of INS gene with 13 growth and body composition traits was studied. The T+3737C genotypes were significantly associated with small intestine length (P = 0.0002), and the A+3971G genotypes were significantly associated with early growth (hatch weight and BW at 28 d of age) (P < 0.0001), breast angle (P = 0.0002), and small intestine length (P < 0.0001). None of the 4 SNP was significantly associated with abdominal fat pad weight (P > 0.05). The haplotypes based on the 4 SNP were also significantly associated with early growth (hatch weight and BW at 28 d of age; P < 0.0001) and breast angle (P < 0.0001) but not with small intestine length (P = 0.0505). These results suggested that variation of the insulin gene was significantly associated with chicken early growth but not with fat deposition. In addition, the data from the present study supported the inference that both the one-SNP-at-a-time and the haplotype-based approaches have their own advantages and disadvantages when association analysis of one SNP and haplotypes with chicken complex traits was conducted.
Collapse
|