1
|
Downing GJ, Tramontozzi LM, Garcia J, Villanueva E. Harnessing Internet Search Data as a Potential Tool for Medical Diagnosis: Literature Review. JMIR Ment Health 2025; 12:e63149. [PMID: 39813106 PMCID: PMC11862766 DOI: 10.2196/63149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 12/12/2024] [Accepted: 12/26/2024] [Indexed: 01/18/2025] Open
Abstract
BACKGROUND The integration of information technology into health care has created opportunities to address diagnostic challenges. Internet searches, representing a vast source of health-related data, hold promise for improving early disease detection. Studies suggest that patterns in search behavior can reveal symptoms before clinical diagnosis, offering potential for innovative diagnostic tools. Leveraging advancements in machine learning, researchers have explored linking search data with health records to enhance screening and outcomes. However, challenges like privacy, bias, and scalability remain critical to its widespread adoption. OBJECTIVE We aimed to explore the potential and challenges of using internet search data in medical diagnosis, with a specific focus on diseases and conditions such as cancer, cardiovascular disease, mental and behavioral health, neurodegenerative disorders, and nutritional and metabolic diseases. We examined ethical, technical, and policy considerations while assessing the current state of research, identifying gaps and limitations, and proposing future research directions to advance this emerging field. METHODS We conducted a comprehensive analysis of peer-reviewed literature and informational interviews with subject matter experts to examine the landscape of internet search data use in medical research. We searched for published peer-reviewed literature on the PubMed database between October and December 2023. RESULTS Systematic selection based on predefined criteria included 40 articles from the 2499 identified articles. The analysis revealed a nascent domain of internet search data research in medical diagnosis, marked by advancements in analytics and data integration. Despite challenges such as bias, privacy, and infrastructure limitations, emerging initiatives could reshape data collection and privacy safeguards. CONCLUSIONS We identified signals correlating with diagnostic considerations in certain diseases and conditions, indicating the potential for such data to enhance clinical diagnostic capabilities. However, leveraging internet search data for improved early diagnosis and health care outcomes requires effectively addressing ethical, technical, and policy challenges. By fostering interdisciplinary collaboration, advancing infrastructure development, and prioritizing patient engagement and consent, researchers can unlock the transformative potential of internet search data in medical diagnosis to ultimately enhance patient care and advance health care practice and policy.
Collapse
Affiliation(s)
- Gregory J Downing
- Innovation Horizons, Inc, Washington, DC, United States
- Department of Health Systems Administration, School of Health, Georgetown University, Washington, DC, United States
| | | | | | | |
Collapse
|
2
|
Sadeh-Sharvit S, Fitzsimmons-Craft EE, Taylor CB, Yom-Tov E. Predicting eating disorders from Internet activity. Int J Eat Disord 2020; 53:1526-1533. [PMID: 32706444 PMCID: PMC8011598 DOI: 10.1002/eat.23338] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 06/16/2020] [Accepted: 06/16/2020] [Indexed: 01/19/2023]
Abstract
OBJECTIVE Eating disorders (EDs) compromise the health and functioning of affected individuals, but it can often take them several years to acknowledge their illness and seek treatment. Early identification of individuals with EDs is a public health priority, and innovative approaches are needed for such identification and ultimate linkage with evidence-based interventions. This study examined whether Internet activity data can predict ED risk/diagnostic status, potentially informing timely interventions. METHOD Participants were 936 women who completed a clinically validated online survey for EDs, and 231 of them (24.7%) contributed their Internet browsing history. A machine learning algorithm used key attributes from participants' Internet activity histories to predict their ED status: clinical/subclinical ED, high risk for an ED, or no ED. RESULTS The algorithm reached an accuracy of 52.6% in predicting ED risk/diagnostic status, compared to random decision accuracy of 38.1%, a relative improvement of 38%. The most predictive Internet search history variables were the following: use of keywords related to ED symptoms and websites promoting ED content, participant age, median browsing events per day, and fraction of daily activity at noon. DISCUSSION ED risk or clinical status can be predicted via machine learning with moderate accuracy using Internet activity variables. This model, if replicated in larger samples where it demonstrates stronger predictive value, could identify populations where further assessment is merited. Future iterations could also inform tailored digital interventions, timed to be provided when target online behaviors occur, thereby potentially improving the well-being of many individuals who may otherwise remain undetected.
Collapse
Affiliation(s)
- Shiri Sadeh-Sharvit
- Baruch Ivcher School of Psychology, Interdisciplinary Center, Herzliya, Israel,Center for m2Health, Palo Alto University, Palo Alto, CA, USA
| | | | - C. Barr Taylor
- Center for m2Health, Palo Alto University, Palo Alto, CA, USA,Stanford University, Stanford, CA, USA
| | | |
Collapse
|
3
|
Schueller SM, Steakley-Freeman DM, Mohr DC, Yom-Tov E. Understanding perceived barriers to treatment from web browsing behavior. J Affect Disord 2020; 267:63-66. [PMID: 32063574 DOI: 10.1016/j.jad.2020.01.131] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 12/02/2019] [Accepted: 01/21/2020] [Indexed: 10/25/2022]
Abstract
BACKGROUND The expanding amount of information available from our use of technologies has led researchers to explore how this information can aid in the detection of mental health issues. We expand on past work in this area by exploring how browsing histories might be able to predict perceived barriers to psychological treatment. METHODS We obtained 10 days of browsing history data for 255 respondents as well as assessments of Perceived Barriers to Psychological Treatments and depression, the Patient Health Questionnaire. RESULTS We found that browsing histories enabled high performance classification of people with high levels of perceived barriers to psychological treatments (AUC average of 0.86). LIMITATIONS Our high classification accuracy does not help understand why different features within the browsing histories are useful to classify people according to browsing history. We also look at people who decided to contribute their browsing history but the use of this data more generally presents additional ethical questions. CONCLUSIONS Browsing histories might be useful to classify people's barriers to seeking psychological treatment. It is clinically relevant to find those who perceive barriers to seeking treatment to better design ways to address those concerns and help them find treatment.
Collapse
Affiliation(s)
- Stephen M Schueller
- Department of Psychological Science, University of California, Irvine, Irvine, CA, USA; Department of Preventive Medicine, Northwestern University, Chicago, IL, USA.
| | | | - David C Mohr
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | | |
Collapse
|
4
|
Samaras L, García-Barriocanal E, Sicilia MA. Syndromic surveillance using web data: a systematic review. INNOVATION IN HEALTH INFORMATICS 2020. [PMCID: PMC7153324 DOI: 10.1016/b978-0-12-819043-2.00002-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
During the recent years, a lot of debate is taken place about the evolution of Smart Healthcare systems. Particularly, how these systems can help people improve human conditions of health, by taking advantages of the new Information and Communication Technologies (ICT), regarding early prediction and efficient treatment. The purpose of this study is to provide a systematic review of the current literature available that focuses on information systems on syndromic surveillance using web data. All published items concern articles, books, reviews, reports, conference announcements, and dissertations. We used a variation of PRISMA Statements methodology to conduct a systematic review. The review identifies the relevant published papers from the year 2004 to 2018, systematically includes and explores them to extract similarities, gaps, and conclusions on the research that has been done so far. The results presented concern the year, the examined disease, the web data source, the geographic location/country, and the data analysis method used. The results show that influenza is the most examined infectious disease. The internet tools most used are Twitter and Google. Regarding the geographical areas explored in the published papers, the most examined country is the United States, since many scientists come from this country. There is a significant growth of articles since 2009. There are also various statistical methods used to correlate the data retrieved from the internet to the data from national authorities. The conclusion of all researches is that the Web can be a useful tool for the detection of serious epidemics and for a creation of a syndromic surveillance system using the Web, since we can predict epidemics from web data before they are officially detected in population. With the advance of ICT, Smart Healthcare can benefit from the monitoring of epidemics and the early prediction of such a system, improving national or international health strategies and policy decision. This can be achieved through the provision of new technology tools to enhance health monitoring systems toward the new innovations of Smart Health or eHealth, even with the emerging technologies of Internet of Things. The challenges and impacts of an electronic system based on internet data include the social, medical, and technological disciplines. These can be further extended to Smart Healthcare, as the data streaming can provide with real-time information, awareness on epidemics and alerts for both patients or medical scientists. Finally, these new systems can help improve the standards of human life.
Collapse
|
5
|
Jordan KN, Pennebaker JW, Petrie KJ, Dalbeth N. Googling Gout: Exploring Perceptions About Gout Through a Linguistic Analysis of Online Search Activities. Arthritis Care Res (Hoboken) 2019; 71:419-426. [PMID: 29781577 DOI: 10.1002/acr.23598] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/15/2018] [Indexed: 12/31/2022]
Abstract
OBJECTIVE To understand what terms people seeking information about gout use most frequently in online searches and to explore the psychological and emotional tone of these searches. METHODS A large de-identified data set of search histories from major search engines was analyzed. Participants who searched for gout (n = 1,117), arthritis (arthritis search control group, age and sex-matched, n = 2,036), and a random set of age and sex-matched participants (general control group, n = 2,150) were included. Searches were analyzed using Meaning Extraction Helper and Linguistic Inquiry and Word Count. RESULTS The most frequent unique searches in the gout search group included gout-related and food-related terms. Those who searched for gout were most likely to search for words related to eating or avoidance. In contrast, those who searched for arthritis were more likely to search for disease- or health-related words. Compared with the general control group, higher information seeking was observed for the gout and arthritis search groups. Compared with the general control group, both the gout and arthritis search groups searched for more food-related words and fewer leisure and sex-related words. The searches of both the gout and arthritis search groups were lower in positivity and higher in the frequency of sadness-related words. CONCLUSION The perception of gout as a condition managed by dietary strategies aligns with online information seeking about the disease and its management. In contrast, people searching for information about arthritis focus more on medical strategies. Linguistic analyses reflect greater disability in social and leisure activities and lower positive emotion for those searching for gout or arthritis.
Collapse
|
6
|
Giat E, Yom-Tov E. Evidence From Web-Based Dietary Search Patterns to the Role of B12 Deficiency in Non-Specific Chronic Pain: A Large-Scale Observational Study. J Med Internet Res 2018; 20:e4. [PMID: 29305340 PMCID: PMC5775484 DOI: 10.2196/jmir.8667] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Revised: 11/01/2017] [Accepted: 11/04/2017] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Profound vitamin B12 deficiency is a known cause of disease, but the role of low or intermediate levels of B12 in the development of neuropathy and other neuropsychiatric symptoms, as well as the relationship between eating meat and B12 levels, is unclear. OBJECTIVE The objective of our study was to investigate the role of low or intermediate levels of B12 in the development of neuropathy and other neuropsychiatric symptoms. METHODS We used food-related Internet search patterns from a sample of 8.5 million people based in the US as a proxy for B12 intake and correlated these searches with Internet searches related to possible effects of B12 deficiency. RESULTS Food-related search patterns were highly correlated with known consumption and food-related searches (ρ=.69). Awareness of B12 deficiency was associated with a higher consumption of B12-rich foods and with queries for B12 supplements. Searches for terms related to neurological disorders were correlated with searches for B12-poor foods, in contrast with control terms. Popular medicines, those having fewer indications, and those which are predominantly used to treat pain, were more strongly correlated with the ability to predict neuropathic pain queries using the B12 contents of food. CONCLUSIONS Our findings show that Internet search patterns are a useful way of investigating health questions in large populations, and suggest that low B12 intake may be associated with a broader spectrum of neurological disorders than previously thought.
Collapse
Affiliation(s)
- Eitan Giat
- Rheumatology Unit, The Autoimmune Center, Sheba Medical Center, Ramat Gan, Israel
| | | |
Collapse
|
7
|
|
8
|
Quantifying Network Dynamics and Information Flow Across Chinese Social Media During the African Ebola Outbreak. Disaster Med Public Health Prep 2017; 12:26-37. [PMID: 28760166 DOI: 10.1017/dmp.2017.29] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
OBJECTIVE Social media provides us with a new platform on which to explore how the public responds to disasters and, of particular importance, how they respond to the emergence of infectious diseases such as Ebola. Provided it is appropriately informed, social media offers a potentially powerful means of supporting both early detection and effective containment of communicable diseases, which is essential for improving disaster medicine and public health preparedness. METHODS The 2014 West African Ebola outbreak is a particularly relevant contemporary case study on account of the large number of annual arrivals from Africa, including Chinese employees engaged in projects in Africa. Weibo (Weibo Corp, Beijing, China) is China's most popular social media platform, with more than 2 billion users and over 300 million daily posts, and offers great opportunity to monitor early detection and promotion of public health awareness. RESULTS We present a proof-of-concept study of a subset of Weibo posts during the outbreak demonstrating potential and identifying priorities for improving the efficacy and accuracy of information dissemination. We quantify the evolution of the social network topology within Weibo relating to the efficacy of information sharing. CONCLUSIONS We show how relatively few nodes in the network can have a dominant influence over both the quality and quantity of the information shared. These findings make an important contribution to disaster medicine and public health preparedness from theoretical and methodological perspectives for dealing with epidemics. (Disaster Med Public Health Preparedness. 2018;12:26-37).
Collapse
|
9
|
Menachemi N, Rahurkar S, Rahurkar M. Using Web-Based Search Data to Study the Public's Reactions to Societal Events: The Case of the Sandy Hook Shooting. JMIR Public Health Surveill 2017; 3:e12. [PMID: 28336508 PMCID: PMC5383805 DOI: 10.2196/publichealth.6033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Revised: 10/29/2016] [Accepted: 02/03/2017] [Indexed: 11/21/2022] Open
Abstract
Background Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. Objective The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public’s reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Methods Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. Results A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to “guns” (+50.06%), “shooting incident” (+333.71%), “ammunition” (+155.14%), and “gun-related laws” (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following “shooting incident” queries whereas searches for “guns” (+61.02%) and “ammunition” (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Conclusions Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development.
Collapse
Affiliation(s)
- Nir Menachemi
- Richard M. Fairbanks School of Public HealthHealth Policy and ManagementIndiana University-IUPUIIndianapolis, INUnited States.,Regenstrief InstituteCenter for Biomedical InformaticsIndianapolis, INUnited States
| | - Saurabh Rahurkar
- Regenstrief InstituteCenter for Biomedical InformaticsIndianapolis, INUnited States
| | | |
Collapse
|
10
|
Ben-Sasson A, Yom-Tov E. Online Concerns of Parents Suspecting Autism Spectrum Disorder in Their Child: Content Analysis of Signs and Automated Prediction of Risk. J Med Internet Res 2016; 18:e300. [PMID: 27876688 PMCID: PMC5141337 DOI: 10.2196/jmir.5439] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 06/29/2016] [Accepted: 09/22/2016] [Indexed: 02/05/2023] Open
Abstract
Background Online communities are used as platforms by parents to verify developmental and health concerns related to their child. The increasing public awareness of autism spectrum disorders (ASD) leads more parents to suspect ASD in their child. Early identification of ASD is important for early intervention. Objective To characterize the symptoms mentioned in online queries posed by parents who suspect that their child might have ASD and determine whether they are age-specific. To test the efficacy of machine learning tools in classifying the child’s risk of ASD based on the parent’s narrative. Methods To this end, we analyzed online queries posed by parents who were concerned that their child might have ASD and categorized the warning signs they mentioned according to ASD-specific and non-ASD–specific domains. We then used the data to test the efficacy with which a trained machine learning tool classified the degree of ASD risk. Yahoo Answers, a social site for posting queries and finding answers, was mined for queries of parents asking the community whether their child has ASD. A total of 195 queries were sampled for this study (mean child age=38.0 months; 84.7% [160/189] boys). Content text analysis of the queries aimed to categorize the types of symptoms described and obtain clinical judgment of the child’s ASD-risk level. Results Concerns related to repetitive and restricted behaviors and interests (RRBI) were the most prevalent (75.4%, 147/195), followed by concerns related to language (61.5%, 120/195) and emotional markers (50.3%, 98/195). Of the 195 queries, 18.5% (36/195) were rated by clinical experts as low-risk, 30.8% (60/195) as medium-risk, and 50.8% (99/195) as high-risk. Risk groups differed significantly (P<.001) in the rate of concerns in the language, social, communication, and RRBI domains. When testing whether an automatic classifier (decision tree) could predict if a query was medium- or high-risk based on the text of the query and the coded symptoms, performance reached an area under the receiver operating curve (ROC) curve of 0.67 (CI 95% 0.50-0.78), whereas predicting from the text and the coded signs resulted in an area under the curve of 0.82 (0.80-0.86). Conclusions Findings call for health care providers to closely listen to parental ASD-related concerns, as recommended by screening guidelines. They also demonstrate the need for Internet-based screening systems that utilize parents’ narratives using a decision tree questioning method.
Collapse
Affiliation(s)
| | - Elad Yom-Tov
- Microsoft Research & Development, Herzelia, Israel
| |
Collapse
|
11
|
Callahan A, Pernek I, Stiglic G, Leskovec J, Strasberg HR, Shah NH. Analyzing Information Seeking and Drug-Safety Alert Response by Health Care Professionals as New Methods for Surveillance. J Med Internet Res 2015; 17:e204. [PMID: 26293444 PMCID: PMC4642796 DOI: 10.2196/jmir.4427] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 07/24/2015] [Indexed: 11/13/2022] Open
Abstract
Background Patterns in general consumer online search logs have been used to monitor health conditions and to predict health-related activities, but the multiple contexts within which consumers perform online searches make significant associations difficult to interpret. Physician information-seeking behavior has typically been analyzed through survey-based approaches and literature reviews. Activity logs from health care professionals using online medical information resources are thus a valuable yet relatively untapped resource for large-scale medical surveillance. Objective To analyze health care professionals’ information-seeking behavior and assess the feasibility of measuring drug-safety alert response from the usage logs of an online medical information resource. Methods Using two years (2011-2012) of usage logs from UpToDate, we measured the volume of searches related to medical conditions with significant burden in the United States, as well as the seasonal distribution of those searches. We quantified the relationship between searches and resulting page views. Using a large collection of online mainstream media articles and Web log posts we also characterized the uptake of a Food and Drug Administration (FDA) alert via changes in UpToDate search activity compared with general online media activity related to the subject of the alert. Results Diseases and symptoms dominate UpToDate searches. Some searches result in page views of only short duration, while others consistently result in longer-than-average page views. The response to an FDA alert for Celexa, characterized by a change in UpToDate search activity, differed considerably from general online media activity. Changes in search activity appeared later and persisted longer in UpToDate logs. The volume of searches and page view durations related to Celexa before the alert also differed from those after the alert. Conclusions Understanding the information-seeking behavior associated with online evidence sources can offer insight into the information needs of health professionals and enable large-scale medical surveillance. Our Web log mining approach has the potential to monitor responses to FDA alerts at a national level. Our findings can also inform the design and content of evidence-based medical information resources such as UpToDate.
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States.
| | | | | | | | | | | |
Collapse
|
12
|
Abstract
Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
Collapse
|
13
|
Yom-Tov E, Borsa D, Hayward AC, McKendry RA, Cox IJ. Automatic identification of Web-based risk markers for health events. J Med Internet Res 2015; 17:e29. [PMID: 25626480 PMCID: PMC4327439 DOI: 10.2196/jmir.4082] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Revised: 12/22/2014] [Accepted: 01/12/2015] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The escalating cost of global health care is driving the development of new technologies to identify early indicators of an individual's risk of disease. Traditionally, epidemiologists have identified such risk factors using medical databases and lengthy clinical studies but these are often limited in size and cost and can fail to take full account of diseases where there are social stigmas or to identify transient acute risk factors. OBJECTIVE Here we report that Web search engine queries coupled with information on Wikipedia access patterns can be used to infer health events associated with an individual user and automatically generate Web-based risk markers for some of the common medical conditions worldwide, from cardiovascular disease to sexually transmitted infections and mental health conditions, as well as pregnancy. METHODS Using anonymized datasets, we present methods to first distinguish individuals likely to have experienced specific health events, and classify them into distinct categories. We then use the self-controlled case series method to find the incidence of health events in risk periods directly following a user's search for a query category, and compare to the incidence during other periods for the same individuals. RESULTS Searches for pet stores were risk markers for allergy. We also identified some possible new risk markers; for example: searching for fast food and theme restaurants was associated with a transient increase in risk of myocardial infarction, suggesting this exposure goes beyond a long-term risk factor but may also act as an acute trigger of myocardial infarction. Dating and adult content websites were risk markers for sexually transmitted infections, such as human immunodeficiency virus (HIV). CONCLUSIONS Web-based methods provide a powerful, low-cost approach to automatically identify risk factors, and support more timely and personalized public health efforts to bring human and economic benefits.
Collapse
|