1
|
Boadu VG, Teye E, Lamptey FP, Amuah CLY, Sam-Amoah L. Novel authentication of African geographical coffee types (bean, roasted, powdered) by handheld NIR spectroscopic method. Heliyon 2024; 10:e35512. [PMID: 39170384 PMCID: PMC11336767 DOI: 10.1016/j.heliyon.2024.e35512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/25/2024] [Accepted: 07/30/2024] [Indexed: 08/23/2024] Open
Abstract
African coffee is among the best traded coffee types worldwide, and rapid identification of its geographical origin is very important when trading the commodity. The study was important because it used NIR techniques to geographically differentiate between various types of coffee and provide a supply chain traceability method to avoid fraud. In this study, geographic differentiation of African coffee types (bean, roasted, and powder) was achieved using handheld near-infrared spectroscopy and multivariant data processing. Five African countries were used as the origins for the collection of Robusta coffee. The samples were individually scanned at a wavelength of 740-1070 nm, and their spectra profiles were preprocessed with mean centering (MC), multiplicative scatter correction (MSC), and standard normal variate (SNV). Support vector machines (SVM), linear discriminant analysis (LDA), neural networks (NN), random forests (RF), and partial least square discriminate analysis (PLS-DA) were then used to develop a prediction model for African coffee types. The performance of the model was assessed using accuracy and F1-score. Proximate chemical composition was also conducted on the raw and roasted coffee types. The best classification algorithms were developed for the following coffee types: raw bean coffee, SD-PLSDA, and MC + SD-PLSDA. These models had an accuracy of 0.87 and an F1-score of 0.88. SNV + SD-SVM and MSC + SD-NN both had accuracy and F1 scores of 0.97 for roasted coffee beans and 0.96 for roasted coffee powder, respectively. The results revealed that efficient quality assurance may be achieved by using handheld NIR spectroscopy combined with chemometrics to differentiate between different African coffee types according to their geographical origins.
Collapse
Affiliation(s)
- Vida Gyimah Boadu
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
- Akenten Appiah-Menka University of Skills Training and Entrepreneurial Development, Department of Hospitality and Tourism Education, Kumasi, Ghana
| | - Ernest Teye
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
| | - Francis Padi Lamptey
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
- Cape Coast Technical University, Department of Food Science and Postharvest Technology, Cape Coast, Ghana
| | - Charles Lloyd Yeboah Amuah
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Physical Sciences, Department of Physics, Cape Coast, Ghana
| | - L.K. Sam-Amoah
- University of Cape Coast, College of Agriculture and Natural Sciences, School of Agriculture, Department of Agricultural Engineering, Cape Coast, Ghana
| |
Collapse
|
2
|
Bhandari N, Walambe R, Kotecha K, Khare SP. A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Satyajeet P. Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
3
|
Rue-Albrecht K, McGettigan PA, Hernández B, Nalpas NC, Magee DA, Parnell AC, Gordon SV, MacHugh DE. GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data. BMC Bioinformatics 2016; 17:126. [PMID: 26968614 PMCID: PMC4788925 DOI: 10.1186/s12859-016-0971-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Accepted: 02/25/2016] [Indexed: 02/06/2023] Open
Abstract
Background Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0971-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kévin Rue-Albrecht
- Animal Genomics Laboratory, UCD School of Agriculture and Food Science, University College Dublin, Dublin 4, Ireland.,Centre for Pharmacology and Therapeutics, Division of Experimental Medicine, Imperial College London, Hammersmith Hospital, London, W12 0NN, UK
| | - Paul A McGettigan
- Animal Genomics Laboratory, UCD School of Agriculture and Food Science, University College Dublin, Dublin 4, Ireland.,Novartis Pharmaceuticals, Elm Park Business Campus, Merrion Road, Dublin 4, Ireland
| | - Belinda Hernández
- UCD School of Mathematics and Statistics, Insight Centre for Data Analytics, University College Dublin, Dublin 4, Ireland.,UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin 4, Ireland
| | - Nicolas C Nalpas
- Animal Genomics Laboratory, UCD School of Agriculture and Food Science, University College Dublin, Dublin 4, Ireland.,Proteome Center Tübingen, Interfaculty Institute for Cell Biology, University of Tübingen, Auf der Morgenstelle 15, 72076, Tübingen, Germany
| | - David A Magee
- Animal Genomics Laboratory, UCD School of Agriculture and Food Science, University College Dublin, Dublin 4, Ireland
| | - Andrew C Parnell
- UCD School of Mathematics and Statistics, Insight Centre for Data Analytics, University College Dublin, Dublin 4, Ireland
| | - Stephen V Gordon
- UCD School of Veterinary Medicine, University College Dublin, Dublin 4, Ireland.,UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin 4, Ireland
| | - David E MacHugh
- Animal Genomics Laboratory, UCD School of Agriculture and Food Science, University College Dublin, Dublin 4, Ireland. .,UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin 4, Ireland.
| |
Collapse
|
4
|
Immune profiling with a Salmonella Typhi antigen microarray identifies new diagnostic biomarkers of human typhoid. Sci Rep 2013; 3:1043. [PMID: 23304434 PMCID: PMC3540400 DOI: 10.1038/srep01043] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 12/07/2012] [Indexed: 11/08/2022] Open
Abstract
Current serological diagnostic assays for typhoid fever are based on detecting antibodies against Salmonella LPS or flagellum, resulting in a high false-positive rate. Here we used a protein microarray containing 2,724 Salmonella enterica serovar Typhi antigens (>63% of proteome) and identified antibodies against 16 IgG antigens and 77 IgM antigens that were differentially reactive among acute typhoid patients and healthy controls. The IgG target antigens produced a sensitivity of 97% and specificity of 80%, whereas the IgM target antigens produced 97% and 91% sensitivity and specificity, respectively. Our analyses indicated certain features such as membrane association, secretion, and protein expression were significant enriching features of the reactive antigens. About 72% of the serodiagnostic antigens were within the top 25% of the ranked antigen list using a Naïve bayes classifier. These data provide an important resource for improved diagnostics, therapeutics and vaccine development against an important human pathogen.
Collapse
|
5
|
Kim SY. Prediction of hotel bankruptcy using support vector machine, artificial neural network, logistic regression, and multivariate discriminant analysis. SERVICE INDUSTRIES JOURNAL 2011. [DOI: 10.1080/02642060802712848] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
6
|
Cruz-Cano R, Chew DS, Kwok-Pui C, Ming-Ying L. Least-Squares Support Vector Machine Approach to Viral Replication Origin Prediction. INFORMS JOURNAL ON COMPUTING 2010; 22:457-470. [PMID: 20729987 PMCID: PMC2923853 DOI: 10.1287/ijoc.1090.0360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Replication of their DNA genomes is a central step in the reproduction of many viruses. Procedures to find replication origins, which are initiation sites of the DNA replication process, are therefore of great importance for controlling the growth and spread of such viruses. Existing computational methods for viral replication origin prediction have mostly been tested within the family of herpesviruses. This paper proposes a new approach by least-squares support vector machines (LS-SVMs) and tests its performance not only on the herpes family but also on a collection of caudoviruses coming from three viral families under the order of caudovirales. The LS-SVM approach provides sensitivities and positive predictive values superior or comparable to those given by the previous methods. When suitably combined with previous methods, the LS-SVM approach further improves the prediction accuracy for the herpesvirus replication origins. Furthermore, by recursive feature elimination, the LS-SVM has also helped find the most significant features of the data sets. The results suggest that the LS-SVMs will be a highly useful addition to the set of computational tools for viral replication origin prediction and illustrate the value of optimization-based computing techniques in biomedical applications.
Collapse
Affiliation(s)
- Raul Cruz-Cano
- Department of Computer and Information Sciences, Texas A&M University-Texarkana, Texarkana, TX, 75501, USA,
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore, and Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore,
- Bioinformatics Program and Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX, 79968, USA,
| | - David S.H. Chew
- Department of Computer and Information Sciences, Texas A&M University-Texarkana, Texarkana, TX, 75501, USA,
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore, and Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore,
- Bioinformatics Program and Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX, 79968, USA,
| | - Choi Kwok-Pui
- Department of Computer and Information Sciences, Texas A&M University-Texarkana, Texarkana, TX, 75501, USA,
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore, and Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore,
- Bioinformatics Program and Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX, 79968, USA,
| | - Leung Ming-Ying
- Department of Computer and Information Sciences, Texas A&M University-Texarkana, Texarkana, TX, 75501, USA,
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore, and Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
- Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546, Singapore,
- Bioinformatics Program and Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX, 79968, USA,
| |
Collapse
|
7
|
Large scale immune profiling of infected humans and goats reveals differential recognition of Brucella melitensis antigens. PLoS Negl Trop Dis 2010; 4:e673. [PMID: 20454614 PMCID: PMC2864264 DOI: 10.1371/journal.pntd.0000673] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2009] [Accepted: 03/19/2010] [Indexed: 01/18/2023] Open
Abstract
Brucellosis is a widespread zoonotic disease that is also a potential agent of bioterrorism. Current serological assays to diagnose human brucellosis in clinical settings are based on detection of agglutinating anti-LPS antibodies. To better understand the universe of antibody responses that develop after B. melitensis infection, a protein microarray was fabricated containing 1,406 predicted B. melitensis proteins. The array was probed with sera from experimentally infected goats and naturally infected humans from an endemic region in Peru. The assay identified 18 antigens differentially recognized by infected and non-infected goats, and 13 serodiagnostic antigens that differentiate human patients proven to have acute brucellosis from syndromically similar patients. There were 31 cross-reactive antigens in healthy goats and 20 cross-reactive antigens in healthy humans. Only two of the serodiagnostic antigens and eight of the cross-reactive antigens overlap between humans and goats. Based on these results, a nitrocellulose line blot containing the human serodiagnostic antigens was fabricated and applied in a simple assay that validated the accuracy of the protein microarray results in the diagnosis of humans. These data demonstrate that an experimentally infected natural reservoir host produces a fundamentally different immune response than a naturally infected accidental human host. Brucellosis is a bacterial disease transmitted from infected animals to humans. This disease often presents as a prolonged but non-specific illness primarily characterized as fever without specific organ localization. Because infections can result after ingestion (typically from unpasteurized animal milk or milk products from goats, cattle or sheep) or inhalation (important because of bioterrorism potential) of small numbers of organisms, the bacteria that cause brucellosis are potential biological warfare agents. Here, a protein microarray containing 1406 Brucella melitensis proteins was used to study the antibody response of experimentally infected goats and naturally infected humans in B. melitensis infection. Goats recognized 18 proteins and humans recognized 13 proteins as serodiagnostic antigens; antibody detection of only two of these antigens was shared by goats and humans, suggesting either fundamentally different immune responses or different responses in relation to mode or setting of infection. The human serodiagnostic antigens were evaluated in a simple nitrocellulose line blot assay, which validated the protein microarray results. The approach described here will lead to the development of new diagnostics for brucellosis and other infectious diseases, and aid in understanding the human and animal host immune response to pathogenic organisms.
Collapse
|
8
|
Schönmann S, Loy A, Wimmersberger C, Sobek J, Aquino C, Vandamme P, Frey B, Rehrauer H, Eberl L. 16S rRNA gene-based phylogenetic microarray for simultaneous identification of members of the genus Burkholderia. Environ Microbiol 2009; 11:779-800. [PMID: 19396938 DOI: 10.1111/j.1462-2920.2008.01800.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
For cultivation-independent and highly parallel analysis of members of the genus Burkholderia, an oligonucleotide microarray (phylochip) consisting of 131 hierarchically nested 16S rRNA gene-targeted oligonucleotide probes was developed. A novel primer pair was designed for selective amplification of a 1.3 kb 16S rRNA gene fragment of Burkholderia species prior to microarray analysis. The diagnostic performance of the microarray for identification and differentiation of Burkholderia species was tested with 44 reference strains of the genera Burkholderia, Pandoraea, Ralstonia and Limnobacter. Hybridization patterns based on presence/absence of probe signals were interpreted semi-automatically using the novel likelihood-based strategy of the web-tool Phylo- Detect. Eighty-eight per cent of the reference strains were correctly identified at the species level. The evaluated microarray was applied to investigate shifts in the Burkholderia community structure in acidic forest soil upon addition of cadmium, a condition that selected for Burkholderia species. The microarray results were in agreement with those obtained from phylogenetic analysis of Burkholderia 16S rRNA gene sequences recovered from the same cadmiumcontaminated soil, demonstrating the value of the Burkholderia phylochip for determinative and environmental studies.
Collapse
Affiliation(s)
- Susan Schönmann
- Institute of Plant Biology, Department of Microbiology, University of Zurich, Zollikerstrasse 107, 8008 Zurich, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Lin E, Hwang Y. A support vector machine approach to assess drug efficacy of interferon-alpha and ribavirin combination therapy. Mol Diagn Ther 2008; 12:219-23. [PMID: 18652518 DOI: 10.1007/bf03256287] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
BACKGROUND Interferon-alpha (IFNalpha) in combination with ribavirin can be used for the treatment of patients with chronic hepatitis C. This therapeutic approach achieves an overall sustained response rate of approximately 40%, but treatment takes 6-12 months and patients often experience significant adverse reactions. OBJECTIVE We aim to develop a tool to distinguish potential responders from nonresponders prior to initiation of IFNalpha-ribavirin treatment. METHODS Using single nucleotide polymorphisms (SNPs) and viral genotype, we applied the support vector machine (SVM) algorithm to build a tool to predict responsiveness to IFNalpha-ribavirin combination therapy. Furthermore, we utilized the SVM algorithm with the recursive feature elimination method to identify a subset of factors that are significantly more influential than the others. RESULTS AND CONCLUSION The SVM model is a promising method for inferring responsiveness to IFNalpha dealing with the complex nonlinear relationship between factors (such as SNPs and viral genotype) and successful therapy. In this study, we demonstrate that our tool may allow patients and doctors to make more informed decisions by analyzing host SNP and viral genotype information.
Collapse
|
10
|
Zwick ME, Kiley MP, Stewart AC, Mateczun A, Read TD. Genotyping of Bacillus cereus strains by microarray-based resequencing. PLoS One 2008; 3:e2513. [PMID: 18596941 PMCID: PMC2438477 DOI: 10.1371/journal.pone.0002513] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2008] [Accepted: 05/18/2008] [Indexed: 11/20/2022] Open
Abstract
The ability to distinguish microbial pathogens from closely related but nonpathogenic strains is key to understanding the population biology of these organisms. In this regard, Bacillus anthracis, the bacterium that causes inhalational anthrax, is of interest because it is closely related and often difficult to distinguish from other members of the B. cereus group that can cause diverse diseases. We employed custom-designed resequencing arrays (RAs) based on the genome sequence of Bacillus anthracis to generate 422 kb of genomic sequence from a panel of 41 Bacillus cereus sensu lato strains. Here we show that RAs represent a “one reaction” genotyping technology with the ability to discriminate between highly similar B. anthracis isolates and more divergent strains of the B. cereus s.l. Clade 1. Our data show that RAs can be an efficient genotyping technology for pre-screening the genetic diversity of large strain collections to selected the best candidates for whole genome sequencing.
Collapse
Affiliation(s)
- Michael E Zwick
- Biological Defense Research Directorate, Naval Medical Research Center, Silver Spring, Maryland, United States of America. Michael E. Zwick
| | | | | | | | | |
Collapse
|