1
|
Avery L, Rotondi M. Evaluation of Respondent-Driven Sampling Prevalence Estimators Using Real-World Reported Network Degree. SOCIOLOGICAL METHODOLOGY 2023; 53:269-287. [PMID: 37456805 PMCID: PMC10338697 DOI: 10.1177/00811750231163832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Respondent-driven sampling (RDS) is used to measure trait or disease prevalence in populations that are difficult to reach and often marginalized. The authors evaluated the performance of RDS estimators under varying conditions of trait prevalence, homophily, and relative activity. They used large simulated networks (N = 20,000) derived from real-world RDS degree reports and an empirical Facebook network (N = 22,470) to evaluate estimators of binary and categorical trait prevalence. Variability in prevalence estimates is higher when network degree is drawn from real-world samples than from the commonly assumed Poisson distribution, resulting in lower coverage rates. Newer estimators perform well when the sample is a substantive proportion of the population, but bias is present when the population size is unknown. The choice of preferred RDS estimator needs to be study specific, considering both statistical properties and knowledge of the population under study.
Collapse
Affiliation(s)
- Lisa Avery
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | | |
Collapse
|
2
|
Grubb J, Lopez D, Mohan B, Matta J. Network centrality for the identification of biomarkers in respondent-driven sampling datasets. PLoS One 2021; 16:e0256601. [PMID: 34428228 PMCID: PMC8384166 DOI: 10.1371/journal.pone.0256601] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 08/10/2021] [Indexed: 12/24/2022] Open
Abstract
Networks science techniques are frequently used to provide meaningful insights into the populations underlying medical and social data. This paper examines SATHCAP, a dataset related to HIV and drug use in three US cities. In particular, we use network measures such as betweenness centrality, closeness centrality, and eigenvector centrality to find central, important nodes in a network derived from SATHCAP data. We evaluate the attributes of these important nodes and create an exceptionality score based on the number of nodes that share a particular attribute. This score, along with the underlying network itself, is used to reveal insight into the attributes of groups that can be effectively targeted to slow the spread of disease. Our research confirms a known connection between homelessness and HIV, as well as drug abuse and HIV, and shows support for the theory that individuals without easy access to transportation are more likely to be central to the spread of HIV in urban, high risk populations.
Collapse
Affiliation(s)
- Jacob Grubb
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| | - Derek Lopez
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| | - Bhuvaneshwar Mohan
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| | - John Matta
- Computer Science Department, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
| |
Collapse
|
3
|
Kramer J, Boone L, Clifford T, Bruce J, Matta J. Analysis of Medical Data Using Community Detection on Inferred Networks. IEEE J Biomed Health Inform 2020; 24:3136-3143. [PMID: 32749973 DOI: 10.1109/jbhi.2020.3003827] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Performing network-based analysis on medical and biological data makes a wide variety of machine learning tools available. Clustering, which can be used for classification, presents opportunities for identifying hard-to-reach groups for the development of customized health interventions. Due to a desire to convert abundant DNA gene co-expression data into networks, many graph inference methods have been developed. Likewise there are many clustering and classification tools. This paper presents a comparison of techniques for graph inference and clustering, using different numbers of features, in order to select the best tuple of graph inference method, clustering method, and number of features according to a particular phenotype. An extensive machine learning based analysis of the REGARDS dataset is conducted, evaluating the CoNet and K-Nearest Neighbors (KNN) network inference methods, along with the Louvain, Leiden and NBR-Clust clustering techniques. Results from analysis involving five internal cluster evaluation indices show the traditional KNN inference method and NBR-Clust and Louvain clustering produce the most promising clusters with medical phenotype data. It is also shown that visualization can aid in interpreting the clusters, and that the clusters produced can identify meaningful groups indicating customized interventions.
Collapse
|
4
|
Green AKB, McCormick TH, Raftery AE. Consistency for the tree bootstrap in respondent-driven sampling. Biometrika 2020; 107:497-504. [PMID: 32454530 PMCID: PMC7228542 DOI: 10.1093/biomet/asz067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Indexed: 11/28/2022] Open
Abstract
Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Baraff et al. (2016) introduced an approach to estimating uncertainty in population proportion estimates from respondent-driven sampling using the tree bootstrap method. In this paper we establish the consistency of this tree bootstrap approach in the case of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$m$\end{document}-trees.
Collapse
Affiliation(s)
- A K B Green
- Scottish Government, Atlantic Quay, 150 Broomielaw, Glasgow G2 8LU, UK
| | - T H McCormick
- Department of Statistics, University of Washington, Seattle, Washington 98195-4322, USA
| | - A E Raftery
- Department of Statistics, University of Washington, Seattle, Washington 98195-4322, USA
| |
Collapse
|
5
|
Cheng S, Eck DJ, Crawford FW. Estimating the size of a hidden finite set: Large-sample behavior of estimators. STATISTICS SURVEYS 2020. [DOI: 10.1214/19-ss127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
6
|
Verdery AM, Weir S, Reynolds Z, Mulholland G, Edwards JK. Estimating Hidden Population Sizes with Venue-based Sampling: Extensions of the Generalized Network Scale-up Estimator. Epidemiology 2019; 30:901-910. [PMID: 31299014 PMCID: PMC6768707 DOI: 10.1097/ede.0000000000001059] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 06/12/2019] [Indexed: 11/27/2022]
Abstract
BACKGROUND Researchers use a variety of population size estimation methods to determine the sizes of key populations at elevated risk of human immunodeficiency virus (HIV)/acquired immune deficiency syndrome (AIDS), an important step in quantifying epidemic impact, advocating for high-risk groups, and planning, implementing, and monitoring prevention, care, and treatment programs. Conventional procedures often use information about sample respondents' social network contacts to estimate the sizes of key populations of interest. A recent study proposes a generalized network scale-up method that combines two samples-a traditional sample of the general population and a link-tracing sample of the hidden population-and produces more accurate results with fewer assumptions than conventional approaches. METHODS We extended the generalized network scale-up method from link-tracing samples to samples collected with venue-based sampling designs popular in sampling key populations at risk of HIV. Our method obviates the need for a traditional sample of the general population, as long as the size of the venue-attending population is approximately known. We tested the venue-based generalized network scale-up method in a comprehensive simulation evaluation framework. RESULTS The venue-based generalized network scale-up method provided accurate and efficient estimates of key population sizes, even when few members of the key population were sampled, yielding average biases below ±6% except when false-positive reporting error is high. It relies on limited assumptions and, in our tests, was robust to numerous threats to inference. CONCLUSIONS Key population size estimation is vital to the successful implementation of efforts to combat HIV/AIDS. Venue-based network scale-up approaches offer another tool that researchers and policymakers can apply to these problems.
Collapse
Affiliation(s)
- Ashton M. Verdery
- From the Department of Sociology and Criminology, The Pennsylvania State University, University Park, PA
| | - Sharon Weir
- Department of Epidemiology, The Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Zahra Reynolds
- Department of Epidemiology, The Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Grace Mulholland
- Department of Epidemiology, The Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Jessie K. Edwards
- Department of Epidemiology, The Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, NC
| |
Collapse
|
7
|
Zeng L, Li J, Crawford FW. Empirical evidence of recruitment bias in a network study of people who inject drugs. THE AMERICAN JOURNAL OF DRUG AND ALCOHOL ABUSE 2019; 45:460-469. [PMID: 30896982 DOI: 10.1080/00952990.2019.1584203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Background: Epidemiologic surveys of people who inject drugs (PWID) can be difficult to conduct because potential participants may fear exposure or legal repercussions. Respondent-driven sampling (RDS) is a procedure in which subjects recruit their eligible social contacts. The statistical validity of RDS surveys of PWID and other risk groups depends on subjects recruiting at random from among their network contacts. Objectives: We sought to develop and apply a rigorous definition and statistical tests for uniform network recruitment in an RDS survey. Methods: We undertook a detailed study of recruitment bias in a unique RDS study of PWID in Hartford, CT, the USA in which the network, individual-level covariates, and social link attributes were recorded. A total of n=527 participants (402 male, 123 female, and two individuals who did not specify their gender) within a network of 2626 PWID were recruited. Results: We found strong evidence of recruitment bias with respect to age, homelessness, and social relationship characteristics. In the discrete model, the estimated hazard ratios regarding the significant features of recruitment time and choice of recruitee were: alter's age 1.03 [1.02, 1.05], alter's crack-using status 0.70 [0.50, 1.00], homelessness difference 0.61 [0.43, 0.87], and sharing activities in drug preparation 2.82 [1.39, 5.72]. Under both the discrete and continuous-time recruitment regression models, we reject the null hypothesis of uniform recruitment. Conclusions: The results provide the evidence that for this study population of PWID, recruitment bias may significantly alter the sample composition, making results of RDS surveys less reliable. More broadly, RDS studies that fail to collect comprehensive network data may not be able to detect biased recruitment when it occurs.
Collapse
Affiliation(s)
- Li Zeng
- a Department of Biostatistics, Yale School of Public Health , New Haven , CT , USA
| | - Jianghong Li
- b Institute for Community Research , Hartford , CT , USA
| | - Forrest W Crawford
- a Department of Biostatistics, Yale School of Public Health , New Haven , CT , USA.,c Department of Ecology and Evolutionary Biology, Yale University , New Haven , CT , USA.,d Yale School of Management , New Haven , CT , USA
| |
Collapse
|
8
|
Wu J, Crawford FW, Kim DA, Stafford D, Christakis NA. Exposure, hazard, and survival analysis of diffusion on social networks. Stat Med 2018; 37:2561-2585. [PMID: 29707798 PMCID: PMC6933552 DOI: 10.1002/sim.7658] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 12/05/2017] [Accepted: 02/15/2018] [Indexed: 11/09/2022]
Abstract
Sociologists, economists, epidemiologists, and others recognize the importance of social networks in the diffusion of ideas and behaviors through human societies. To measure the flow of information on real-world networks, researchers often conduct comprehensive sociometric mapping of social links between individuals and then follow the spread of an "innovation" from reports of adoption or change in behavior over time. The innovation is introduced to a small number of individuals who may also be encouraged to spread it to their network contacts. In conjunction with the known social network, the pattern of adoptions gives researchers insight into the spread of the innovation in the population and factors associated with successful diffusion. Researchers have used widely varying statistical tools to estimate these quantities, and there is disagreement about how to analyze diffusion on fully observed networks. Here, we describe a framework for measuring features of diffusion processes on social networks using the epidemiological concepts of exposure and competing risks. Given a realization of a diffusion process on a fully observed network, we show that classical survival regression models can be adapted to estimate the rate of diffusion, and actor/edge attributes associated with successful transmission or adoption, while accounting for the topology of the social network. We illustrate these tools by applying them to a randomized network intervention trial conducted in Honduras to estimate the rate of adoption of 2 health-related interventions-multivitamins and chlorine bleach for water purification-and determine factors associated with successful social transmission.
Collapse
Affiliation(s)
- Jiacheng Wu
- Department of Biostatistics, University of Washington, Seattle, WA 98105, U.S.A
| | - Forrest W. Crawford
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, U.S.A
- Department of Operations, Yale School of Management, New Haven, CT 06511, U.S.A
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, U.S.A
| | - David A. Kim
- Department of Emergency Medicine, Stanford University, Stanford, CA 94305, U.S.A
| | - Derek Stafford
- Department of Political Science, University of Michigan, Ann Arbor, MI 48109, U.S.A
| | - Nicholas A. Christakis
- Department of Ecology & Evolutionary Biology, Yale University, New Haven, CT 06511, U.S.A
- Department of Sociology, Yale University, New Haven, CT 06511, U.S.A
- Department of Medicine, Yale School of Medicine, New Haven, CT 06510, U.S.A
- Department of Biomedical Engineering, New Haven, CT 06511, U.S.A
| |
Collapse
|
9
|
Crawford FW, Wu J, Heimer R. Hidden population size estimation from respondent-driven sampling: a network approach. J Am Stat Assoc 2018; 113:755-766. [PMID: 30828120 PMCID: PMC6392194 DOI: 10.1080/01621459.2017.1285775] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 01/01/2017] [Indexed: 12/29/2022]
Abstract
Estimating the size of stigmatized, hidden, or hard-to-reach populations is a major problem in epidemiology, demography, and public health research. Capture-recapture and multiplier methods are standard tools for inference of hidden population sizes, but they require random sampling of target population members, which is rarely possible. Respondent-driven sampling (RDS) is a survey method for hidden populations that relies on social link tracing. The RDS recruitment process is designed to spread through the social network connecting members of the target population. In this paper, we show how to use network data revealed by RDS to estimate hidden population size. The key insight is that the recruitment chain, timing of recruitments, and network degrees of recruited subjects provide information about the number of individuals belonging to the target population who are not yet in the sample. We use a computationally efficient Bayesian method to integrate over the missing edges in the subgraph of recruited individuals. We validate the method using simulated data and apply the technique to estimate the number of people who inject drugs in St. Petersburg, Russia.
Collapse
Affiliation(s)
| | | | - Robert Heimer
- Department of Epidemiology of Microbial Diseases Yale School of Public Health
| |
Collapse
|
10
|
Crawford FW, Aronow PM, Zeng L, Li J. Identification of Homophily and Preferential Recruitment in Respondent-Driven Sampling. Am J Epidemiol 2018; 187:153-160. [PMID: 28605424 PMCID: PMC5860647 DOI: 10.1093/aje/kwx208] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 03/08/2017] [Accepted: 03/09/2017] [Indexed: 11/12/2022] Open
Abstract
Respondent-driven sampling (RDS) is a link-tracing procedure used in epidemiologic research on hidden or hard-to-reach populations in which subjects recruit others via their social networks. Estimates from RDS studies may have poor statistical properties due to statistical dependence in sampled subjects' traits. Two distinct mechanisms account for dependence in an RDS study: homophily, the tendency for individuals to share social ties with others exhibiting similar characteristics, and preferential recruitment, in which recruiters do not recruit uniformly at random from their network alters. The different effects of network homophily and preferential recruitment in RDS studies have been a source of confusion and controversy in methodological and empirical research in epidemiology. In this work, we gave formal definitions of homophily and preferential recruitment and showed that neither is identified in typical RDS studies. We derived nonparametric identification regions for homophily and preferential recruitment and showed that these parameters were not identified unless the network took a degenerate form. The results indicated that claims of homophily or recruitment bias measured from empirical RDS studies may not be credible. We applied our identification results to a study involving both a network census and RDS on a population of injection drug users in Hartford, Connecticut (2012-2013).
Collapse
Affiliation(s)
- Forrest W Crawford
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut
- Yale School of Management, New Haven, Connecticut
| | - Peter M Aronow
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
- Department of Political Science, Yale University, New Haven, Connecticut
- Yale School of Management, New Haven, Connecticut
| | - Li Zeng
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
| | - Jianghong Li
- Institute for Community Research, Hartford, Connecticut
| |
Collapse
|
11
|
Heimer R, Barbour R, Khouri D, Crawford FW, Shebl F, Aaraj E, Khoshnood K. HIV Risk, Prevalence, and Access to Care Among Men Who Have Sex with Men in Lebanon. AIDS Res Hum Retroviruses 2017; 33:1149-1154. [PMID: 28540733 DOI: 10.1089/aid.2016.0326] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Little is known about HIV prevalence and risk among men who have sex with men (MSM) in much of the Middle East, including Lebanon. Recent national-level surveillance has suggested an increase in HIV prevalence concentrated among men in Lebanon. We undertook a biobehavioral study to provide direct evidence for the spread of HIV. MSM were recruited by respondent-driven sampling, interviewed, and offered HIV testing anonymously at sites located in Beirut, Lebanon, from October 2014 through February 2015. The interview questionnaire was designed to obtain information on participants' sociodemographic situation, sexual behaviors, alcohol and drug use, health, HIV testing and care, and experiences of stigma and discrimination. Individuals not reporting an HIV diagnosis were offered optional, anonymous HIV testing. Among the 292 MSM recruited, we identified 36 cases of HIV (12.3%). A quarter of the MSM were born in Syria and recently arrived in Lebanon. Condom use was uncommon; 65% reported condomless sex with other men. Group sex encounters were reported by 22% of participants. Among the 32 individuals already aware of their infection, 30 were in treatment and receiving antiretroviral therapy. HIV prevalence was substantially increased over past estimates. Efforts to control future increases will have to focus on reducing specific risk behaviors and experience of stigma and abuse, especially among Syrian refugees.
Collapse
Affiliation(s)
- Robert Heimer
- Department of Epidemiology of Microbial Diseases, Yale University School of Public Health, New Haven, Connecticut
| | - Russell Barbour
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut
| | | | - Forrest W. Crawford
- Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut
| | - Fatma Shebl
- Department of Chronic Disease Epidemiology, Yale University School of Public Health, New Haven, Connecticut
| | - Elie Aaraj
- Middle East and North Africa Harm Reduction Association, Sin-el-Fil, Lebanon
| | - Kaveh Khoshnood
- Department of Epidemiology of Microbial Diseases, Yale University School of Public Health, New Haven, Connecticut
| |
Collapse
|
12
|
Verdery AM, Fisher JC, Siripong N, Abdesselam K, Bauldry S. NEW SURVEY QUESTIONS AND ESTIMATORS FOR NETWORK CLUSTERING WITH RESPONDENT-DRIVEN SAMPLING DATA. SOCIOLOGICAL METHODOLOGY 2017; 47:274-306. [PMID: 30337767 PMCID: PMC6191199 DOI: 10.1177/0081175017716489] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Respondent-driven sampling (RDS) is a popular method for sampling hard-to-survey populations that leverages social network connections through peer recruitment. While RDS is most frequently applied to estimate the prevalence of infections and risk behaviors of interest to public health, such as HIV/AIDS or condom use, it is rarely used to draw inferences about the structural properties of social networks among such populations because it does not typically collect the necessary data. Drawing on recent advances in computer science, we introduce a set of data collection instruments and RDS estimators for network clustering, an important topological property that has been linked to a network's potential for diffusion of information, disease, and health behaviors. We use simulations to explore how these estimators, originally developed for random walk samples of computer networks, perform when applied to RDS samples with characteristics encountered in realistic field settings that depart from random walks. In particular, we explore the effects of multiple seeds, without replacement versus with replacement, branching chains, imperfect response rates, preferential recruitment, and misreporting of ties. We find that clustering coefficient estimators retain desirable properties in RDS samples. This paper takes an important step toward calculating network characteristics using nontraditional sampling methods, and it expands the potential of RDS to tell researchers more about hidden populations and the social factors driving disease prevalence.
Collapse
|