1
|
de Mello E Silva JF, de Jesus Silva N, Carrilho TRB, Jesus Pinto ED, Rocha AS, Pedroso J, Silva SA, Spaniol AM, da Costa Santin de Andrade R, Bortolini GA, Paixão E, Kac G, de Cássia Ribeiro-Silva R, Barreto ML. Identifying biologically implausible values in big longitudinal data: an example applied to child growth data from the Brazilian food and nutrition surveillance system. BMC Med Res Methodol 2024; 24:38. [PMID: 38360575 PMCID: PMC10868032 DOI: 10.1186/s12874-024-02161-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/24/2024] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND Several strategies for identifying biologically implausible values in longitudinal anthropometric data have recently been proposed, but the suitability of these strategies for large population datasets needs to be better understood. This study evaluated the impact of removing population outliers and the additional value of identifying and removing longitudinal outliers on the trajectories of length/height and weight and on the prevalence of child growth indicators in a large longitudinal dataset of child growth data. METHODS Length/height and weight measurements of children aged 0 to 59 months from the Brazilian Food and Nutrition Surveillance System were analyzed. Population outliers were identified using z-scores from the World Health Organization (WHO) growth charts. After identifying and removing population outliers, residuals from linear mixed-effects models were used to flag longitudinal outliers. The following cutoffs for residuals were tested to flag those: -3/+3, -4/+4, -5/+5, -6/+6. The selected child growth indicators included length/height-for-age z-scores and weight-for-age z-scores, classified according to the WHO charts. RESULTS The dataset included 50,154,738 records from 10,775,496 children. Boys and girls had 5.74% and 5.31% of length/height and 5.19% and 4.74% of weight values flagged as population outliers, respectively. After removing those, the percentage of longitudinal outliers varied from 0.02% (<-6/>+6) to 1.47% (<-3/>+3) for length/height and from 0.07 to 1.44% for weight in boys. In girls, the percentage of longitudinal outliers varied from 0.01 to 1.50% for length/height and from 0.08 to 1.45% for weight. The initial removal of population outliers played the most substantial role in the growth trajectories as it was the first step in the cleaning process, while the additional removal of longitudinal outliers had lower influence on those, regardless of the cutoff adopted. The prevalence of the selected indicators were also affected by both population and longitudinal (to a lesser extent) outliers. CONCLUSIONS Although both population and longitudinal outliers can detect biologically implausible values in child growth data, removing population outliers seemed more relevant in this large administrative dataset, especially in calculating summary statistics. However, both types of outliers need to be identified and removed for the proper evaluation of trajectories.
Collapse
Affiliation(s)
| | - Natanael de Jesus Silva
- Centre for Data and Knowledge Integration for Health, Gonçalo Moniz Institute, Oswaldo Cruz Foundation, Salvador, BA, Brazil
- ISGlobal, Hospital Clínic. Universitat de Barcelona, Barcelona, Spain
| | - Thaís Rangel Bousquet Carrilho
- Nutritional Epidemiology Observatory, Josué de Castro Nutrition Institute, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
- Department of Obstetrics and Gynaecology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Elizabete de Jesus Pinto
- Centre for Data and Knowledge Integration for Health, Gonçalo Moniz Institute, Oswaldo Cruz Foundation, Salvador, BA, Brazil
- Federal University of Recôncavo da Bahia, Santo Antônio de Jesus, BA, Brazil
| | - Aline Santos Rocha
- Centre for Data and Knowledge Integration for Health, Gonçalo Moniz Institute, Oswaldo Cruz Foundation, Salvador, BA, Brazil
- Food and Nutrition Coordinating Unit, Ministry of Health, Brasília, DF, Brazil
| | - Jéssica Pedroso
- Food and Nutrition Coordinating Unit, Ministry of Health, Brasília, DF, Brazil
| | - Sara Araújo Silva
- Food and Nutrition Coordinating Unit, Ministry of Health, Brasília, DF, Brazil
| | - Ana Maria Spaniol
- Food and Nutrition Coordinating Unit, Ministry of Health, Brasília, DF, Brazil
| | | | | | - Enny Paixão
- London School of Hygiene & Tropical Medicine, London, UK
| | - Gilberto Kac
- Nutritional Epidemiology Observatory, Josué de Castro Nutrition Institute, Federal University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Rita de Cássia Ribeiro-Silva
- Centre for Data and Knowledge Integration for Health, Gonçalo Moniz Institute, Oswaldo Cruz Foundation, Salvador, BA, Brazil.
- School of Nutrition, Federal University of Bahia, Av. Araújo Pinho, nº 32, Canela, Salvador, Bahia, CEP: 40.110-150, BA, Brazil.
| | - Maurício L Barreto
- Centre for Data and Knowledge Integration for Health, Gonçalo Moniz Institute, Oswaldo Cruz Foundation, Salvador, BA, Brazil
- Institute of Collective Health, Federal University of Bahia, Salvador, BA, Brazil
| |
Collapse
|
2
|
Boone-Heinonen J, Tillotson CJ, O'Malley JP, Marino M, Andrea SB, Brickman A, DeVoe J, Puro J. Not so implausible: impact of longitudinal assessment of implausible anthropometric measures on obesity prevalence and weight change in children and adolescents. Ann Epidemiol 2019; 31:69-74.e5. [PMID: 30799202 DOI: 10.1016/j.annepidem.2019.01.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 12/20/2018] [Accepted: 01/13/2019] [Indexed: 11/20/2022]
Abstract
PURPOSE Implausible anthropometric measures are typically identified using population outlier definitions, conflating implausible and extreme measures. We determined the impact of a longitudinal outlier approach on prevalence of body mass index (BMI) categories and mean change in anthropometric measures in pediatric electronic health record data. METHODS We examined 996,131 observations from 147,375 children (10-18 years) in the ADVANCE Clinical Data Research Network, a national network of community health centers. Sex-stratified, mixed effects, linear spline regression modeled weight, height, and BMI as a function of age. Longitudinal outliers were defined as observations with studentized residual greater than |6|; population outliers were defined by Centers for Disease Control-defined z-score thresholds. RESULTS At least 99.7% of anthropometric measures were not extreme by longitudinal or population definitions (agreement ≥ 0.995). BMI category prevalence after excluding longitudinal or population outliers differed by less than 0.1%. Among children greater than 85th percentile at baseline, annual mean changes in anthropometric measures were larger in data that excluded longitudinal (girls: 1.24 inches, 12.39 pounds, 1.53 kg/m2; boys: 2.34, 14.08, 1.07) versus population outliers (girls: 0.61 inches, 8.22 pounds, 0.75 kg/m2; boys: 1.53, 11.61, 0.48). CONCLUSIONS Longitudinal outlier methods may reduce underestimation of anthropometric change in children with elevated baseline values.
Collapse
|
3
|
Shi J, Korsiak J, Roth DE. New approach for the identification of implausible values and outliers in longitudinal childhood anthropometric data. Ann Epidemiol 2018; 28:204-211.e3. [PMID: 29398298 PMCID: PMC5840491 DOI: 10.1016/j.annepidem.2018.01.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/23/2017] [Accepted: 01/08/2018] [Indexed: 11/03/2022]
Abstract
PURPOSE We aimed to demonstrate the use of jackknife residuals to take advantage of the longitudinal nature of available growth data in assessing potential biologically implausible values and outliers. METHODS Artificial errors were induced in 5% of length, weight, and head circumference measurements, measured on 1211 participants from the Maternal Vitamin D for Infant Growth (MDIG) trial from birth to 24 months of age. Each child's sex- and age-standardized z-score or raw measurements were regressed as a function of age in child-specific models. Each error responsible for a biologically implausible decrease between a consecutive pair of measurements was identified based on the higher of the two absolute values of jackknife residuals in each pair. In further analyses, outliers were identified as those values beyond fixed cutoffs of the jackknife residuals (e.g., greater than +5 or less than -5 in primary analyses). Kappa, sensitivity, and specificity were calculated over 1000 simulations to assess the ability of the jackknife residual method to detect induced errors and to compare these methods with the use of conditional growth percentiles and conventional cross-sectional methods. RESULTS Among the induced errors that resulted in a biologically implausible decrease in measurement between two consecutive values, the jackknife residual method identified the correct value in 84.3%-91.5% of these instances when applied to the sex- and age-standardized z-scores, with kappa values ranging from 0.685 to 0.795. Sensitivity and specificity of the jackknife method were higher than those of the conditional growth percentile method, but specificity was lower than for conventional cross-sectional methods. CONCLUSIONS Using jackknife residuals provides a simple method to identify biologically implausible values and outliers in longitudinal child growth data sets in which each child contributes at least 4 serial measurements.
Collapse
Affiliation(s)
- Joy Shi
- Centre for Global Child Health and SickKids Research Institute, Hospital for Sick Children, Toronto, ON, Canada
| | - Jill Korsiak
- Centre for Global Child Health and SickKids Research Institute, Hospital for Sick Children, Toronto, ON, Canada
| | - Daniel E Roth
- Centre for Global Child Health and SickKids Research Institute, Hospital for Sick Children, Toronto, ON, Canada; Department of Pediatrics, Hospital for Sick Children and University of Toronto, Toronto, ON, Canada.
| |
Collapse
|