1
|
Jarquin-Yañez L, Martinez-Acuña MI, Lopez-Arevalo I, Calderon Hernandez J. "Characterization of residential proximity to sources of environmental carcinogens in clusters of Acute Lymphoblastic Leukemia in San Luis Potosi, Mexico". Environ Res 2024; 252:118790. [PMID: 38555983 DOI: 10.1016/j.envres.2024.118790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/23/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND Acute Lymphoblastic Leukemia (ALL) is the most prevalent neoplasia in children and teenagers in Mexico. Although epidemiological data supports that children's residence close to emissions from vehicular traffic or industrial processes increases the risk of ALL; and the IARC states that benzene, PAHs, and PM 2.5 are well-known environmental carcinogens, there is a gap in linking these carcinogenic hazards with the sources and their distribution from scenario perspective. AIM To identify ALL clusters in the population under 19 years of age and characterize the environment at the neighborhood level by integrating information on sources of carcinogenic exposure using spatial analysis techniques in the Metropolitan Area of San Luis Potosi, Mexico. METHODS Using the Kernel Density test, we designed an ecological study to identify ALL clusters from incident cases in the population under 19 years of age. A multicriteria analysis was conducted to characterize the risk at the community level from carcinogenic sources. A hierarchical cluster analysis was performed to characterize risk at the individual level based on carcinogenic source count within 1 km for each ALL case. RESULTS Eight clusters of carcinogenic sources were located within the five identified ALL clusters. The multicriteria analysis showed high-risk areas (by density of carcinogenic source) within ALL clusters. CONCLUSIONS This study has a limited source and amount of available data on ALL cases, so selection bias is present as well as the inability to rule out residual confounding factors, since covariates were not included. However, in this study, children living in environments with high vehicular density, gas stations, brick kilns, incinerators, commercial establishments burning biomass, or near industrial zones may be at higher risk for ALL.
Collapse
Affiliation(s)
- Lizet Jarquin-Yañez
- Academic Unit of Chemical Sciences, Autonomous University of Zacatecas, Jardín Juárez 147, Centro, 98000 Zacatecas, Zac, Mexico; National Council of Humanities, Sciences and Technologies (CONAHCYT), Mexico, Mexico City
| | - Monica Imelda Martinez-Acuña
- Academic Unit of Chemical Sciences, Autonomous University of Zacatecas, Jardín Juárez 147, Centro, 98000 Zacatecas, Zac, Mexico
| | - Ivan Lopez-Arevalo
- Cinvestav Tamaulipas, Science and Technology Park TecnoTam, 87130, Victoria, Tamaulipas, Mexico
| | - Jaqueline Calderon Hernandez
- Center for Applied Research in Environment and Health, CIACYT-Faculty of Medicine, Autonomous University of San Luis Potosí, Avenida Sierra Leona No. 550, Lomas 2nd Section, 78210, San Luis Potosí, SLP, Mexico; Global Public Health Program, Boston College, Boston, MA, United States.
| |
Collapse
|
2
|
Ordoñez-Guillen NE, Gonzalez-Compean JL, Lopez-Arevalo I, Contreras-Murillo M, Aldana-Bobadilla E. Machine learning based study for the classification of Type 2 diabetes mellitus subtypes. BioData Min 2023; 16:24. [PMID: 37608329 PMCID: PMC10463725 DOI: 10.1186/s13040-023-00340-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 08/07/2023] [Indexed: 08/24/2023] Open
Abstract
PURPOSE Data-driven diabetes research has increased its interest in exploring the heterogeneity of the disease, aiming to support in the development of more specific prognoses and treatments within the so-called precision medicine. Recently, one of these studies found five diabetes subgroups with varying risks of complications and treatment responses. Here, we tackle the development and assessment of different models for classifying Type 2 Diabetes (T2DM) subtypes through machine learning approaches, with the aim of providing a performance comparison and new insights on the matter. METHODS We developed a three-stage methodology starting with the preprocessing of public databases NHANES (USA) and ENSANUT (Mexico) to construct a dataset with N = 10,077 adult diabetes patient records. We used N = 2,768 records for training/validation of models and left the remaining (N = 7,309) for testing. In the second stage, groups of observations -each one representing a T2DM subtype- were identified. We tested different clustering techniques and strategies and validated them by using internal and external clustering indices; obtaining two annotated datasets Dset A and Dset B. In the third stage, we developed different classification models assaying four algorithms, seven input-data schemes, and two validation settings on each annotated dataset. We also tested the obtained models using a majority-vote approach for classifying unseen patient records in the hold-out dataset. RESULTS From the independently obtained bootstrap validation for Dset A and Dset B, mean accuracies across all seven data schemes were [Formula: see text] ([Formula: see text]) and [Formula: see text] ([Formula: see text]), respectively. Best accuracies were [Formula: see text] and [Formula: see text]. Both validation setting results were consistent. For the hold-out dataset, results were consonant with most of those obtained in the literature in terms of class proportions. CONCLUSION The development of machine learning systems for the classification of diabetes subtypes constitutes an important task to support physicians for fast and timely decision-making. We expect to deploy this methodology in a data analysis platform to conduct studies for identifying T2DM subtypes in patient records from hospitals.
Collapse
Affiliation(s)
- Nelson E Ordoñez-Guillen
- Cinvestav Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, 87130, Tamaulipas, Mexico
| | | | - Ivan Lopez-Arevalo
- Cinvestav Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, 87130, Tamaulipas, Mexico
| | - Miguel Contreras-Murillo
- Cinvestav Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, 87130, Tamaulipas, Mexico
| | - Edwin Aldana-Bobadilla
- CONAHCYT-Centro de Investigación y de Estudios Avanzados del IPN, Unidad Tamaulipas, Carretera Victoria-Soto la Marina km 5.5, Victoria, Tamaulipas, 87130, Mexico
| |
Collapse
|
3
|
Rios-Alvarado AB, Martinez-Rodriguez JL, Garcia-Perez AG, Guerrero-Melendez TY, Lopez-Arevalo I, Gonzalez-Compean JL. Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00805-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Abstract
AbstractKnowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.
Collapse
|
4
|
Crespo-Sanchez M, Lopez-Arevalo I, Aldana-Bobadilla E, Molina-Villegas A. A content spectral-based text representation. IFS 2022. [DOI: 10.3233/jifs-219248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.
Collapse
Affiliation(s)
- Melesio Crespo-Sanchez
- Centro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas, Victoria, Mexico
| | - Ivan Lopez-Arevalo
- Centro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas, Victoria, Mexico
| | - Edwin Aldana-Bobadilla
- Conacyt - Centro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas, Victoria, Mexico
| | | |
Collapse
|
5
|
Lopez-Arevalo I, Gonzalez-Compean JL, Hinojosa-Tijerina M, Martinez-Rendon C, Montella R, Martinez-Rodriguez JL. A WoT-Based Method for Creating Digital Sentinel Twins of IoT Devices. Sensors (Basel) 2021; 21:s21165531. [PMID: 34450973 PMCID: PMC8400860 DOI: 10.3390/s21165531] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/06/2021] [Accepted: 08/10/2021] [Indexed: 11/24/2022]
Abstract
The data produced by sensors of IoT devices are becoming keystones for organizations to conduct critical decision-making processes. However, delivering information to these processes in real-time represents two challenges for the organizations: the first one is achieving a constant dataflow from IoT to the cloud and the second one is enabling decision-making processes to retrieve data from dataflows in real-time. This paper presents a cloud-based Web of Things method for creating digital twins of IoT devices (named sentinels).The novelty of the proposed approach is that sentinels create an abstract window for decision-making processes to: (a) find data (e.g., properties, events, and data from sensors of IoT devices) or (b) invoke functions (e.g., actions and tasks) from physical devices (PD), as well as from virtual devices (VD). In this approach, the applications and services of decision-making processes deal with sentinels instead of managing complex details associated with the PDs, VDs, and cloud computing infrastructures. A prototype based on the proposed method was implemented to conduct a case study based on a blockchain system for verifying contract violation in sensors used in product transportation logistics. The evaluation showed the effectiveness of sentinels enabling organizations to attain data from IoT sensors and the dataflows used by decision-making processes to convert these data into useful information.
Collapse
Affiliation(s)
| | | | | | | | - Raffaele Montella
- Department of Science and Technologies, University of Napoli Parthenope, 80133 Napoli, Italy;
| | - Jose L. Martinez-Rodriguez
- Reynosa Rodhe Multidisciplinary Academic Unit, Autonomous University of Tamaulipas, Reynosa 88779, Mexico;
| |
Collapse
|
6
|
Rubio-Sandoval JI, Martinez-Rodriguez JL, Lopez-Arevalo I, Rios-Alvarado AB, Rodriguez-Rodriguez AJ, Vargas-Requena DT. An Indoor Navigation Methodology for Mobile Devices by Integrating Augmented Reality and Semantic Web. Sensors (Basel) 2021; 21:s21165435. [PMID: 34450877 PMCID: PMC8401022 DOI: 10.3390/s21165435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/04/2021] [Accepted: 08/09/2021] [Indexed: 11/16/2022]
Abstract
Indoor navigation systems incorporating augmented reality allow users to locate places within buildings and acquire more knowledge about their environment. However, although diverse works have been introduced with varied technologies, infrastructure, and functionalities, a standardization of the procedures for elaborating these systems has not been reached. Moreover, while systems usually handle contextual information of places in proprietary formats, a platform-independent model is desirable, which would encourage its access, updating, and management. This paper proposes a methodology for developing indoor navigation systems based on the integration of Augmented Reality and Semantic Web technologies to present navigation instructions and contextual information about the environment. It comprises four modules to define a spatial model, data management (supported by an ontology), positioning and navigation, and content visualization. A mobile application system was developed for testing the proposal in academic environments, modeling the structure, routes, and places of two buildings from independent institutions. The experiments cover distinct navigation tasks by participants in both scenarios, recording data such as navigation time, position tracking, system functionality, feedback (answering a survey), and a navigation comparison when the system is not used. The results demonstrate the system's feasibility, where the participants show a positive interest in its functionalities.
Collapse
Affiliation(s)
- Jesus Ivan Rubio-Sandoval
- Reynosa Rodhe Multidisciplinary Academic Unit, Autonomous University of Tamaulipas, Reynosa 88779, Mexico; (J.I.R.-S.); (A.J.R.-R.); (D.T.V.-R.)
| | - Jose L. Martinez-Rodriguez
- Reynosa Rodhe Multidisciplinary Academic Unit, Autonomous University of Tamaulipas, Reynosa 88779, Mexico; (J.I.R.-S.); (A.J.R.-R.); (D.T.V.-R.)
- Correspondence:
| | - Ivan Lopez-Arevalo
- Centro de Investigación y de Estudios Avanzados del I.P.N. Unidad Tamaulipas (Cinvestav Tamaulipas), Victoria 87130, Mexico;
| | - Ana B. Rios-Alvarado
- Faculty of Engineering and Science, Autonomous University of Tamaulipas, Victoria 87000, Mexico;
| | - Adolfo Josue Rodriguez-Rodriguez
- Reynosa Rodhe Multidisciplinary Academic Unit, Autonomous University of Tamaulipas, Reynosa 88779, Mexico; (J.I.R.-S.); (A.J.R.-R.); (D.T.V.-R.)
| | - David Tomas Vargas-Requena
- Reynosa Rodhe Multidisciplinary Academic Unit, Autonomous University of Tamaulipas, Reynosa 88779, Mexico; (J.I.R.-S.); (A.J.R.-R.); (D.T.V.-R.)
| |
Collapse
|
7
|
Lopez-Arevalo I, Aldana-Bobadilla E, Molina-Villegas A, Galeana-Zapién H, Muñiz-Sanchez V, Gausin-Valle S. A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning. Entropy (Basel) 2020; 22:e22121391. [PMID: 33316972 PMCID: PMC7763608 DOI: 10.3390/e22121391] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 12/01/2022]
Abstract
The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon’s Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.
Collapse
Affiliation(s)
- Ivan Lopez-Arevalo
- Centro de Investigación y de Estudios Avanzados del I.P.N., Unidad Tamaulipas, Victoria 87130, Mexico; (H.G.-Z.); (S.G.-V.)
- Correspondence:
| | - Edwin Aldana-Bobadilla
- Conacyt-Centro de Investigación y de Estudios Avanzados del I.P.N., Unidad Tamaulipas, Victoria 87130, Mexico;
| | | | - Hiram Galeana-Zapién
- Centro de Investigación y de Estudios Avanzados del I.P.N., Unidad Tamaulipas, Victoria 87130, Mexico; (H.G.-Z.); (S.G.-V.)
| | | | - Saul Gausin-Valle
- Centro de Investigación y de Estudios Avanzados del I.P.N., Unidad Tamaulipas, Victoria 87130, Mexico; (H.G.-Z.); (S.G.-V.)
| |
Collapse
|
8
|
Martinez-Rodriguez JL, Lopez-Arevalo I, Rios-Alvarado AB. Mining information from sentences through Semantic Web data and Information Extraction tasks. J Inf Sci 2020. [DOI: 10.1177/0165551520934387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Semantic Web provides guidelines for the representation of information about real-world objects (entities) and their relations (properties). This is helpful for the dissemination and consumption of information by people and applications. However, the information is mainly contained within natural language sentences, which do not have a structure or linguistic descriptions ready to be directly processed by computers. Thus, the challenge is to identify and extract the elements of information that can be represented. Hence, this article presents a strategy to extract information from sentences and its representation with Semantic Web standards. Our strategy involves Information Extraction tasks and a hybrid semantic similarity measure to get entities and relations that are later associated with individuals and properties from a Knowledge Base to create RDF triples (Subject–Predicate–Object structures). The experiments demonstrate the feasibility of our method and that it outperforms the accuracy provided by a pattern-based method from the literature.
Collapse
Affiliation(s)
- Jose L. Martinez-Rodriguez
- Cinvestav-Tamaulipas, Mexico; Computer Systems Laboratory, Multidisciplinary Academic Unit Reynosa-Rodhe, Autonomous University of Tamaulipas, Mexico
| | | | | |
Collapse
|
9
|
Aldana-Bobadila E, Kuri-Morales A, Lopez-Arevalo I, Rios-Alvarado AB. An unsupervised learning approach for multilayer perceptron networks. Soft comput 2019. [DOI: 10.1007/s00500-018-3655-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Aldana-Bobadilla E, Lopez-Arevalo I, Molina-Villegas A. A novel data reduction method based on information theory and the Eclectic Genetic Algorithm. INTELL DATA ANAL 2017. [DOI: 10.3233/ida-160074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
11
|
Lopez-Arevalo I, Sosa-Sosa VJ, Rojas-Lopez F, Tello-Leal E. Improving selection of synsets from WordNet for domain-specific word sense disambiguation. COMPUT SPEECH LANG 2017. [DOI: 10.1016/j.csl.2016.06.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Escalona-Vargas DI, Gutiérrez D, Lopez-Arevalo I. Performance of different metaheuristics in EEG source localization compared to the Cramér–Rao bound. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2013.04.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
13
|
Escalona-Vargas DI, Gutiérrez D, Lopez-Arevalo I. Cramér-Rao bounds on the performance of simulated annealing and genetic algorithms in EEG source localization. Annu Int Conf IEEE Eng Med Biol Soc 2011; 2011:7115-7118. [PMID: 22255978 DOI: 10.1109/iembs.2011.6091798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In this paper, we evaluate the performance of simulated annealing (SA) and the genetic algorithm (GA) when used for electroencephalographic (EEG) source localization. The performance is evaluated on the variance of the estimated localizations as a function of the optimization's initialization parameters and the signal-to-noise ratio (SNR). We use the concentrated likelihood function (CLF) as objective function and the Cramér-Rao bound (CRB) as a reference on the performance. The CRB sets the lower limit on the variance of our estimated values. Then, our simulations on realistic EEG data show that both SA and GA are highly sensitive to noise, but adjustments on their parameters for a fixed SNR value do not improve performance significantly. Our results also confirm that SA is more sensitive to noise and its performance may be affected by correlated sources.
Collapse
Affiliation(s)
- D I Escalona-Vargas
- Center of Research and Advanced Studies, Cinvestav at Tamaulipas, 87130 Cd Victoria, Mexico.
| | | | | |
Collapse
|