1
|
Berke A, Calacci D, Mahari R, Yabe T, Larson K, Pentland S. Open e-commerce 1.0, five years of crowdsourced U.S. Amazon purchase histories with user demographics. Sci Data 2024; 11:491. [PMID: 38740768 DOI: 10.1038/s41597-024-03329-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/29/2024] [Indexed: 05/16/2024] Open
Abstract
This is a first-of-its-kind dataset containing detailed purchase histories from 5027 U.S. Amazon.com consumers, spanning 2018 through 2022, with more than 1.8 million purchases. Consumer spending data are customarily collected through government surveys to produce public datasets and statistics, which serve public agencies and researchers. Companies now collect similar data through consumers' use of digital platforms at rates superseding data collection by public agencies. We published this dataset in an effort towards democratizing access to rich data sources routinely used by companies. The data were crowdsourced through an online survey and shared with participants' informed consent. Data columns include order date, product code, title, price, quantity, and shipping address state. Each purchase history is linked to survey data with information about participants' demographics, lifestyle, and health. We validate the dataset by showing expenditure correlates with public Amazon sales data (Pearson r = 0.978, p < 0.001) and conduct analyses of specific product categories, demonstrating expected seasonal trends and strong relationships to other public datasets.
Collapse
Affiliation(s)
- Alex Berke
- MIT Media Lab, Cambridge, MA, 02139, USA.
| | - Dan Calacci
- MIT Media Lab, Cambridge, MA, 02139, USA
- Princeton University, Princeton, NJ, 08544, USA
| | - Robert Mahari
- MIT Media Lab, Cambridge, MA, 02139, USA
- Harvard Law School, Cambridge, MA, 02138, USA
| | - Takahiro Yabe
- MIT Institute of Data, Systems, and Society (IDSS), Cambridge, MA, 02139, USA
- New York University Center for Urban Science and Progress, Brooklyn, NY, 11201, USA
| | | | - Sandy Pentland
- MIT Media Lab, Cambridge, MA, 02139, USA
- MIT Connection Science, Cambridge, MA, 02139, USA
| |
Collapse
|
2
|
Vanni F, Lambert D. On an Aggregated Estimate for Human Mobility Regularities through Movement Trends and Population Density. ENTROPY (BASEL, SWITZERLAND) 2024; 26:398. [PMID: 38785646 PMCID: PMC11119206 DOI: 10.3390/e26050398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 04/28/2024] [Accepted: 04/29/2024] [Indexed: 05/25/2024]
Abstract
This article introduces an analytical framework that interprets individual measures of entropy-based mobility derived from mobile phone data. We explore and analyze two widely recognized entropy metrics: random entropy and uncorrelated Shannon entropy. These metrics are estimated through collective variables of human mobility, including movement trends and population density. By employing a collisional model, we establish statistical relationships between entropy measures and mobility variables. Furthermore, our research addresses three primary objectives: firstly, validating the model; secondly, exploring correlations between aggregated mobility and entropy measures in comparison to five economic indicators; and finally, demonstrating the utility of entropy measures. Specifically, we provide an effective population density estimate that offers a more realistic understanding of social interactions. This estimation takes into account both movement regularities and intensity, utilizing real-time data analysis conducted during the peak period of the COVID-19 pandemic.
Collapse
Affiliation(s)
- Fabio Vanni
- Department of Economics, University of Insubria, 21100 Varese, Italy
- Université Côte d’Azur, CNRS, GREDEG, 06103 Nice-Sophia Antipolis, France
| | - David Lambert
- Department of Physics, University of North Texas, Denton, TX 76205, USA;
| |
Collapse
|
3
|
Yang Y, Pentland A, Moro E. Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics. EPJ DATA SCIENCE 2023; 12:15. [PMID: 37220629 PMCID: PMC10193357 DOI: 10.1140/epjds/s13688-023-00390-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 05/09/2023] [Indexed: 05/25/2023]
Abstract
Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million people to 1.1 million places in 11 metro areas in the U.S. to detect the latent mobility behaviors and lifestyles in the largest American cities. Despite the considerable complexity of mobility visitations, we found that lifestyles can be automatically decomposed into only 12 latent interpretable activity behaviors on how people combine shopping, eating, working, or using their free time. Rather than describing individuals with a single lifestyle, we find that city dwellers' behavior is a mixture of those behaviors. Those detected latent activity behaviors are equally present across cities and cannot be fully explained by main demographic features. Finally, we find those latent behaviors are associated with dynamics like experienced income segregation, transportation, or healthy behaviors in cities, even after controlling for demographic features. Our results signal the importance of complementing traditional census data with activity behaviors to understand urban dynamics. Supplementary Information The online version contains supplementary material available at 10.1140/epjds/s13688-023-00390-w.
Collapse
Affiliation(s)
- Yanni Yang
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
- Connection Science, Institute for Data Science and Society, Massachusetts Institute of Technology, Cambridge, MA United States
| | - Alex Pentland
- Connection Science, Institute for Data Science and Society, Massachusetts Institute of Technology, Cambridge, MA United States
| | - Esteban Moro
- Connection Science, Institute for Data Science and Society, Massachusetts Institute of Technology, Cambridge, MA United States
- Grupo Interdisciplinar de Sistemas Complejos (GISC), Department of Mathematics, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
| |
Collapse
|
4
|
Bian R, Murray-Tuite P, Wolshon B. Predicting Grocery Store Visits During the Early Outbreak of COVID-19 with Machine Learning. TRANSPORTATION RESEARCH RECORD 2023; 2677:79-91. [PMID: 37153205 PMCID: PMC10149515 DOI: 10.1177/03611981211043538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
While non-essential travel was canceled during the coronavirus infectious disease (COVID-19) pandemic, grocery shopping was essential. The objectives of this study were to: 1) examine how grocery store visits changed during the early outbreak of COVID-19, and 2) estimate a model to predict the change of grocery store visits in the future, within the same phase of the pandemic. The study period (February 15-May 31, 2020) covered the outbreak and phase-one re-opening. Six counties/states in the United States were examined. Grocery store visits (in-store or curbside pickup) increased over 20% when the national emergency was declared on March 13 and then decreased below the baseline within a week. Grocery store visits on weekends were affected more significantly than those on workdays before late April. Grocery store visits in some states (including California, Louisiana, New York, and Texas) started returning to normal by the end of May, but that was not the case for some of the counties (including those with the cities of Los Angeles and New Orleans). With data from Google Mobility Reports, this study used a long short-term memory network to predict the change of grocery store visits from the baseline in the future. The networks trained with the national data or the county data performed well in predicting the general trend of each county. The results from this study could help understand mobility patterns of grocery store visits during the pandemic and predict the process of returning to normal.
Collapse
Affiliation(s)
- Ruijie Bian
- Louisiana Transportation Research
Center, Louisiana State University, Baton Rouge, LA
| | | | - Brian Wolshon
- Department of Civil and Environmental
Engineering, Louisiana State University, Baton Rouge, LA
| |
Collapse
|
5
|
Pentland A. Toward Network Intelligence. Neural Comput 2023; 35:525-535. [PMID: 36112921 DOI: 10.1162/neco_a_01536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 01/08/2022] [Indexed: 11/04/2022]
Abstract
This article proposes a conceptual framework to guide research in neural computation by relating it to mathematical progress in other fields and to examples illustrative of biological networks. The goal is to provide insight into how biological networks, and possibly large artificial networks such as foundation models, transition from analog computation to an analog approximation of symbolic computation. From the mathematical perspective, I focus on the development of consistent symbolic representations and optimal policies for action selection within network settings. From the biological perspective, I give examples of human and animal social network behavior that may be described using these mathematical models.
Collapse
Affiliation(s)
- Alex Pentland
- Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.
| |
Collapse
|
6
|
Karmazyn-Raz H, Smith LB. Sampling statistics are like story creation: a network analysis of parent-toddler exploratory play. Philos Trans R Soc Lond B Biol Sci 2023; 378:20210358. [PMID: 36571129 PMCID: PMC9791483 DOI: 10.1098/rstb.2021.0358] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 09/04/2022] [Indexed: 12/27/2022] Open
Abstract
Actions in the world elicit data for learning and do so in a stream of interconnected events. Here, we provide evidence on how toddlers with their parent sample information by acting on toys during exploratory play. We observed 10 min of free-flowing and unconstrained object exploration of by toddlers (mean age 21 months) and parents in a room with many available objects (n = 32). Borrowing concepts and measures from the study of narratives, we found that the toy selections are not a string of unrelated events but exhibit a suite of what we call coherence statistics: Zipfian distributions, burstiness and a network structure. We discuss the transient memory processes that underlie the moment-to-moment toy selections that create this coherence and the role of these statistics in the development of abstract and generalizable systems of knowledge. This article is part of the theme issue 'Concepts in interaction: social engagement and inner experiences'.
Collapse
Affiliation(s)
- Hadar Karmazyn-Raz
- Psychological and Brain Sciences, Indiana University, Bloomington, IN 47401, USA
| | - Linda B. Smith
- Psychological and Brain Sciences, Indiana University, Bloomington, IN 47401, USA
| |
Collapse
|
7
|
Abstract
It is often believed that regularities are embedded in mobile behaviors. Highly regular mobile behaviors, such as daily commutes between home and workplace, have been actively investigated in the context of health risks. Less regular mobile behaviors, such as visits to service places (e.g., supermarkets and healthcare facilities), have not received much attention. This study explores the regularity in service place visits using a deep learning method and the effect of place type on the stability of recurring visits using an entropy assessment. Results reveal both periodic and bursty visit behaviors to service places. The periodic visits are prominent on the weekly and bi-weekly scales, and the bursty visits dominate the multi-day scales. Service place type indeed affects the stability of recurring visits, and certain place types have the strongest effect. The research findings substantially expand the knowledge of mobile behaviors and are valuable in informing both visitor-based and place-based health risks.
Collapse
Affiliation(s)
- Shiran Zhong
- Department of Geography, University at Buffalo, the State University of New York, 105 Wilkeson Quad, Buffalo, NY 14261, USA
- Human Environments Analysis Lab, Western University, 1151 Richmond Street, London, Ontario, N6A 3K7, Canada
- Department of Geography & Environment, Western University, 1151 Richmond Street, London, Ontario, N6A 3K7, Canada
| | - Ling Bian
- Department of Geography, University at Buffalo, the State University of New York, 105 Wilkeson Quad, Buffalo, NY 14261, USA
| |
Collapse
|
8
|
Karmazyn-Raz H, Smith LB. Discourse with Few Words: Coherence Statistics, Parent-Infant Actions on Objects, and Object Names. LANGUAGE ACQUISITION 2022; 30:211-229. [PMID: 37736139 PMCID: PMC10513098 DOI: 10.1080/10489223.2022.2054342] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 03/05/2022] [Indexed: 09/23/2023]
Abstract
The data for early object name learning is often conceptualized as a problem of mapping heard names to referents. However, infants do not hear object names as discrete events but rather in extended interactions organized around goal-directed actions on objects. The present study examined the statistical structure of the nonlinguistic events that surround parent naming of objects. Parents and 12-month -old infants were left alone in a room for 10 minutes with 32 objects available for exploration. Parent and infant handling of objects and parent naming of objects were coded. The four measured statistics were from measures used in the study of coherent discourse: (1) a frequency distribution in which actions were frequently directed to a few objects and more rarely to other objects; (2) repeated returns to the high-frequency objects over the 10-minute play period; (3) clustered repetitions, continuity, of actions on objects; and (4) structured networks of transitions among objects in play that connected all the played-with objects. Parent naming was infrequent but related to the statistics of object-directed actions. The implications of the discourse-like stream of actions are discussed in terms of learning mechanisms that could support rapid learning of object names from relatively few name-object co-occurrences.
Collapse
Affiliation(s)
| | - Linda B Smith
- Indiana University, Bloomington, US
- University of East Anglia, Norfolk, UK
| |
Collapse
|
9
|
Human Mobility Support for Personalised Data Offloading. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2022.3153804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
10
|
Farzanehfar A, Houssiau F, de Montjoye YA. The risk of re-identification remains high even in country-scale location datasets. PATTERNS (NEW YORK, N.Y.) 2021; 2:100204. [PMID: 33748793 PMCID: PMC7961185 DOI: 10.1016/j.patter.2021.100204] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/27/2020] [Accepted: 01/07/2021] [Indexed: 11/30/2022]
Abstract
Although anonymous data are not considered personal data, recent research has shown how individuals can often be re-identified. Scholars have argued that previous findings apply only to small-scale datasets and that privacy is preserved in large-scale datasets. Using 3 months of location data, we (1) show the risk of re-identification to decrease slowly with dataset size, (2) approximate this decrease with a simple model taking into account three population-wide marginal distributions, and (3) prove that unicity is convex and obtain a linear lower bound. Our estimates show that 93% of people would be uniquely identified in a dataset of 60M people using four points of auxiliary information, with a lower bound at 22%. This lower bound increases to 87% when five points are available. Taken together, our results show how the privacy of individuals is very unlikely to be preserved even in country-scale location datasets.
Collapse
Affiliation(s)
- Ali Farzanehfar
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| | | | | |
Collapse
|
11
|
Abstract
Food safety continues to threaten public health. Machine learning holds potential in leveraging large, emerging data sets to improve the safety of the food supply and mitigate the impact of food safety incidents. Foodborne pathogen genomes and novel data streams, including text, transactional, and trade data, have seen emerging applications enabled by a machine learning approach, such as prediction of antibiotic resistance, source attribution of pathogens, and foodborne outbreak detection and risk assessment. In this article, we provide a gentle introduction to machine learning in the context of food safety and an overview of recent developments and applications. With many of these applications still in their nascence, general and domain-specific pitfalls and challenges associated with machine learning have begun to be recognized and addressed, which are critical to prospective use and future deployment of large data sets and their associated machine learning models for food safety applications.
Collapse
Affiliation(s)
- Xiangyu Deng
- Center for Food Safety, University of Georgia, Griffin, Georgia 30223, USA;
| | - Shuhao Cao
- Department of Mathematics and Statistics, Washington University, St. Louis, Missouri 63105, USA;
| | - Abigail L Horn
- Department of Preventive Medicine, University of Southern California, Los Angeles, California 90032, USA;
| |
Collapse
|
12
|
Higher-order statistics based multifractal predictability measures for anisotropic turbulence and the theoretical limits of aviation weather forecasting. Sci Rep 2019; 9:19829. [PMID: 31882685 PMCID: PMC6934490 DOI: 10.1038/s41598-019-56304-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 12/04/2019] [Indexed: 11/08/2022] Open
Abstract
Theoretical predictability measures of turbulent atmospheric flows are essential in estimating how realistic the current storm-scale strategic forecast skill expectations are. Atmospheric predictability studies in the past have usually neglected intermittency and anisotropy, which are typical features of atmospheric flows, rendering their application to the storm-scale weather regime ineffective. Furthermore, these studies are frequently limited to second-order statistical measures, which do not contain information about the rarer, more severe, and, therefore, more important (from a forecasting and mitigation perspective) weather events. Here we overcome these rather severe limitations by proposing an analytical expression for the theoretical predictability limits of anisotropic multifractal fields based on higher-order autocorrelation functions. The predictability limits are dependent on the order of statistical moment (q) and are smaller for larger q. Since higher-order statistical measures take into account rarer events, such more extreme phenomena are less predictable. While spatial anisotropy of the fields seems to increase their predictability limits (making them larger than the commonly expected eddy turnover times), the ratio of anisotropic to isotropic predictability limits is independent of q. Our results indicate that reliable storm-scale weather forecasting with around 3 to 5 hours lead time is theoretically possible.
Collapse
|
13
|
De Nadai M, Cardoso A, Lima A, Lepri B, Oliver N. Strategies and limitations in app usage and human mobility. Sci Rep 2019; 9:10935. [PMID: 31358830 PMCID: PMC6662905 DOI: 10.1038/s41598-019-47493-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 07/12/2019] [Indexed: 12/01/2022] Open
Abstract
Cognition has been found to constrain several aspects of human behaviour, such as the number of friends and the number of favourite places a person keeps stable over time. This limitation has been empirically defined in the physical and social spaces. But do people exhibit similar constraints in the digital space? We address this question through the analysis of pseudonymised mobility and mobile application (app) usage data of 400,000 individuals in a European country for six months. Despite the enormous heterogeneity of apps usage, we find that individuals exhibit a conserved capacity that limits the number of applications they regularly use. Moreover, we find that this capacity steadily decreases with age, as does the capacity in the physical space but with more complex dynamics. Even though people might have the same capacity, applications get added and removed over time. In this respect, we identify two profiles of individuals: app keepers and explorers, which differ in their stable (keepers) vs exploratory (explorers) behaviour regarding their use of mobile applications. Finally, we show that the capacity of applications predicts mobility capacity and vice-versa. By contrast, the behaviour of keepers and explorers may considerably vary across the two domains. Our empirical findings provide an intriguing picture linking human behaviour in the physical and digital worlds which bridges research studies from Computer Science, Social Physics and Computational Social Sciences.
Collapse
Affiliation(s)
- Marco De Nadai
- Vodafone Research, Paddington Central, London, W2 6BY, UK.
- Mobs Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123, Povo, TN, Italy.
- Department of Information Engineering and Computer Science, University of Trento, Via Sommarive, 9I, 38123, Povo, TN, Italy.
| | - Angelo Cardoso
- Vodafone Research, Paddington Central, London, W2 6BY, UK
| | - Antonio Lima
- Vodafone Research, Paddington Central, London, W2 6BY, UK
| | - Bruno Lepri
- Mobs Lab, Fondazione Bruno Kessler, Via Sommarive 18, 38123, Povo, TN, Italy
| | - Nuria Oliver
- Vodafone Research, Paddington Central, London, W2 6BY, UK
| |
Collapse
|
14
|
Di Clemente R, Luengo-Oroz M, Travizano M, Xu S, Vaitla B, González MC. Sequences of purchases in credit card data reveal lifestyles in urban populations. Nat Commun 2018; 9:3330. [PMID: 30127416 PMCID: PMC6102281 DOI: 10.1038/s41467-018-05690-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 07/06/2018] [Indexed: 11/09/2022] Open
Abstract
Zipf-like distributions characterize a wide set of phenomena in physics, biology, economics, and social sciences. In human activities, Zipf's law describes, for example, the frequency of appearance of words in a text or the purchase types in shopping patterns. In the latter, the uneven distribution of transaction types is bound with the temporal sequences of purchases of individual choices. In this work, we define a framework using a text compression technique on the sequences of credit card purchases to detect ubiquitous patterns of collective behavior. Clustering the consumers by their similarity in purchase sequences, we detect five consumer groups. Remarkably, post checking, individuals in each group are also similar in their age, total expenditure, gender, and the diversity of their social and mobility networks extracted from their mobile phone records. By properly deconstructing transaction data with Zipf-like distributions, this method uncovers sets of significant sequences that reveal insights on collective human behavior.
Collapse
Affiliation(s)
- Riccardo Di Clemente
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,The Bartlett Centre for Advanced Spatial Analysis, University College London, London, WC1E 6BT, UK
| | - Miguel Luengo-Oroz
- United Nations Global Pulse, 46th Street and 1st Avenue, New York, NY, 10017, USA
| | - Matias Travizano
- GranData, 550 15th Street Suite 36C, San Francisco, CA, 94103, USA
| | - Sharon Xu
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Bapu Vaitla
- Department of Environmental Health, Harvard University, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Marta C González
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Department of City and Regional Planning, Berkeley, CA, 94720-1820, USA. .,Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720-1820, USA.
| |
Collapse
|
15
|
Urkup C, Bozkaya B, Salman FS. Customer mobility signatures and financial indicators as predictors in product recommendation. PLoS One 2018; 13:e0201197. [PMID: 30052681 PMCID: PMC6063431 DOI: 10.1371/journal.pone.0201197] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 07/10/2018] [Indexed: 11/19/2022] Open
Abstract
The rapid growth of mobile payment and geo-aware systems as well as the resulting emergence of Big Data present opportunities to explore individual consuming patterns across space and time. Here we analyze a one-year transaction dataset of a leading commercial bank to understand to what extent customer mobility behavior and financial indicators can predict the use of a target product, namely the Individual Consumer Loan product. After data preprocessing, we generate 13 datasets covering different time intervals and feature groups, and test combinations of 3 feature selection methods and 10 classification algorithms to determine, for each dataset, the best feature selection method and the most influential features, and the best classification algorithm. We observe the importance of spatio-temporal mobility features and financial features, in addition to demography, in predicting the use of this exemplary product with high accuracy (AUC = 0.942). Finally, we analyze the classification results and report on most interesting customer characteristics and product usage implications. Our findings can be used to potentially increase the success rates of product recommendation systems.
Collapse
Affiliation(s)
- Cagan Urkup
- Department of Industrial Engineering, Koç University, Istanbul, Turkey
| | - Burcin Bozkaya
- School of Management, Sabancı University, Istanbul, Turkey
| | - F. Sibel Salman
- Department of Industrial Engineering, Koç University, Istanbul, Turkey
- * E-mail:
| |
Collapse
|
16
|
Zhu Y, Imamura M, Nikovski D, Keogh E. Introducing time series chains: a new primitive for time series data mining. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1224-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
Dong X, Suhara Y, Bozkaya B, Singh VK, Lepri B, Pentland A‘S. Social Bridges in Urban Purchase Behavior. ACM T INTEL SYST TEC 2018. [DOI: 10.1145/3149409] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The understanding and modeling of human purchase behavior in city environment can have important implications in the study of urban economy and in the design and organization of cities. In this article, we study human purchase behavior at the community level and argue that people who live in different communities but work at close-by locations could act as “social bridges” between the respective communities and that they are correlated with similarity in community purchase behavior. We provide empirical evidence by studying millions of credit card transaction records for tens of thousands of individuals in a city environment during a period of three months. More specifically, we show that the number of social bridges between communities is a much stronger indicator of similarity in their purchase behavior than traditionally considered factors such as income and sociodemographic variables. Our findings also suggest that such an effect varies across different merchant categories, that the presence of female customers in social bridges is a stronger indicator compared to that of their male counterparts, and that there seems to be a geographical constraint for this effect, all of which may have implications in the studies of urban economy and data-driven urban planning.
Collapse
Affiliation(s)
- Xiaowen Dong
- Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | | | | |
Collapse
|
18
|
Agarwal RR, Lin CC, Chen KT, Singh VK. Predicting financial trouble using call data-On social capital, phone logs, and financial trouble. PLoS One 2018; 13:e0191863. [PMID: 29474411 PMCID: PMC5825009 DOI: 10.1371/journal.pone.0191863] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 01/13/2018] [Indexed: 12/02/2022] Open
Abstract
An ability to understand and predict financial wellbeing for individuals is of interest to economists, policy designers, financial institutions, and the individuals themselves. According to the Nilson reports, there were more than 3 billion credit cards in use in 2013, accounting for purchases exceeding US$ 2.2 trillion, and according to the Federal Reserve report, 39% of American households were carrying credit card debt from month to month. Prior literature has connected individual financial wellbeing with social capital. However, as yet, there is limited empirical evidence connecting social interaction behavior with financial outcomes. This work reports results from one of the largest known studies connecting financial outcomes and phone-based social behavior (180,000 individuals; 2 years' time frame; 82.2 million monthly bills, and 350 million call logs). Our methodology tackles highly imbalanced dataset, which is a pertinent problem with modelling credit risk behavior, and offers a novel hybrid method that yields improvements over, both, a traditional transaction data only approach, and an approach that uses only call data. The results pave way for better financial modelling of billions of unbanked and underbanked customers using non-traditional metrics like phone-based credit scoring.
Collapse
Affiliation(s)
| | - Chia-Ching Lin
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Kuan-Ta Chen
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Vivek Kumar Singh
- School of Communication and Information, Rutgers University, New Brunswick, New Jersey, United States of America
- Media Labs, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
19
|
Hashemian B, Massaro E, Bojic I, Murillo Arias J, Sobolevsky S, Ratti C. Socioeconomic characterization of regions through the lens of individual financial transactions. PLoS One 2017; 12:e0187031. [PMID: 29190724 PMCID: PMC5708635 DOI: 10.1371/journal.pone.0187031] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 10/12/2017] [Indexed: 11/25/2022] Open
Abstract
People are increasingly leaving digital traces of their daily activities through interacting with their digital environment. Among these traces, financial transactions are of paramount interest since they provide a panoramic view of human life through the lens of purchases, from food and clothes to sport and travel. Although many analyses have been done to study the individual preferences based on credit card transaction, characterizing human behavior at larger scales remains largely unexplored. This is mainly due to the lack of models that can relate individual transactions to macro-socioeconomic indicators. Building these models, not only can we obtain a nearly real-time information about socioeconomic characteristics of regions, usually available yearly or quarterly through official statistics, but also it can reveal hidden social and economic structures that cannot be captured by official indicators. In this paper, we aim to elucidate how macro-socioeconomic patterns could be understood based on individual financial decisions. To this end, we reveal the underlying interconnection of the network of spending leveraging anonymized individual credit/debit card transactions data, craft micro-socioeconomic indices that consists of various social and economic aspects of human life, and propose a machine learning framework to predict macro-socioeconomic indicators.
Collapse
Affiliation(s)
- Behrooz Hashemian
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Emanuele Massaro
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- HERUS Lab, Institute of Environmental Engineering (ENAC), École Polytechinque Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Iva Bojic
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Singapore-MIT Alliance for Research and Technology, Singapore, Singapore
| | | | - Stanislav Sobolevsky
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
- Center For Urban Science and Progress, New York University, Brooklyn, NY, United States of America
- Institute Of Design And Urban Studies of The Saint-Petersburg National Research University Of Information Technologies, Mechanics And Optics, Saint-Petersburg, Russia
| | - Carlo Ratti
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| |
Collapse
|
20
|
Big data analyses reveal patterns and drivers of the movements of southern elephant seals. Sci Rep 2017; 7:112. [PMID: 28273915 PMCID: PMC5427936 DOI: 10.1038/s41598-017-00165-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 02/09/2017] [Indexed: 11/18/2022] Open
Abstract
The growing number of large databases of animal tracking provides an opportunity for analyses of movement patterns at the scales of populations and even species. We used analytical approaches, developed to cope with “big data”, that require no ‘a priori’ assumptions about the behaviour of the target agents, to analyse a pooled tracking dataset of 272 elephant seals (Mirounga leonina) in the Southern Ocean, that was comprised of >500,000 location estimates collected over more than a decade. Our analyses showed that the displacements of these seals were described by a truncated power law distribution across several spatial and temporal scales, with a clear signature of directed movement. This pattern was evident when analysing the aggregated tracks despite a wide diversity of individual trajectories. We also identified marine provinces that described the migratory and foraging habitats of these seals. Our analysis provides evidence for the presence of intrinsic drivers of movement, such as memory, that cannot be detected using common models of movement behaviour. These results highlight the potential for “big data” techniques to provide new insights into movement behaviour when applied to large datasets of animal tracking.
Collapse
|
21
|
Meekan MG, Duarte CM, Fernández-Gracia J, Thums M, Sequeira AMM, Harcourt R, Eguíluz VM. The Ecology of Human Mobility. Trends Ecol Evol 2017; 32:198-210. [PMID: 28162772 DOI: 10.1016/j.tree.2016.12.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 12/11/2016] [Accepted: 12/15/2016] [Indexed: 10/20/2022]
Abstract
Mobile phones and other geolocated devices have produced unprecedented volumes of data on human movement. Analysis of pooled individual human trajectories using big data approaches has revealed a wealth of emergent features that have ecological parallels in animals across a diverse array of phenomena including commuting, epidemics, the spread of innovations and culture, and collective behaviour. Movement ecology, which explores how animals cope with and optimize variability in resources, has the potential to provide a theoretical framework to aid an understanding of human mobility and its impacts on ecosystems. In turn, big data on human movement can be explored in the context of animal movement ecology to provide solutions for urgent conservation problems and management challenges.
Collapse
Affiliation(s)
- Mark G Meekan
- Australian Institute of Marine Science, Indian Ocean Marine Research Centre (IOMRC), University of Western Australia (M470), 35 Stirling Highway, Crawley, WA 6009, Australia
| | - Carlos M Duarte
- King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC), Biological and Environmental Sciences and Engineering (BESE), Thuwal 23955-6900, Saudi Arabia
| | - Juan Fernández-Gracia
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Michele Thums
- Australian Institute of Marine Science, Indian Ocean Marine Research Centre (IOMRC), University of Western Australia (M470), 35 Stirling Highway, Crawley, WA 6009, Australia.
| | - Ana M M Sequeira
- IOMRC and UWA Oceans Institute, The University of Western Australia, School of Animal Biology, M470, 35 Stirling Highway, Crawley, WA 6009, Australia
| | - Rob Harcourt
- Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | - Víctor M Eguíluz
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), E07122 Palma de Mallorca, Spain
| |
Collapse
|
22
|
Krumme AA, Sanfélix-Gimeno G, Franklin JM, Isaman DL, Mahesri M, Matlin OS, Shrank WH, Brennan TA, Brill G, Choudhry NK. Can purchasing information be used to predict adherence to cardiovascular medications? An analysis of linked retail pharmacy and insurance claims data. BMJ Open 2016; 6:e011015. [PMID: 28186924 PMCID: PMC5129090 DOI: 10.1136/bmjopen-2015-011015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE The use of retail purchasing data may improve adherence prediction over approaches using healthcare insurance claims alone. DESIGN Retrospective. SETTING AND PARTICIPANTS A cohort of patients who received prescription medication benefits through CVS Caremark, used a CVS Pharmacy ExtraCare Health Care (ECHC) loyalty card, and initiated a statin medication in 2011. OUTCOME We evaluated associations between retail purchasing patterns and optimal adherence to statins in the 12 subsequent months. RESULTS Among 11 010 statin initiators, 43% were optimally adherent at 12 months of follow-up. Greater numbers of store visits per month and dollar amount per visit were positively associated with optimal adherence, as was making a purchase on the same day as filling a prescription (p<0.0001 for all). Models to predict adherence using retail purchase variables had low discriminative ability (C-statistic: 0.563), while models with both clinical and retail purchase variables achieved a C-statistic of 0.617. CONCLUSIONS While the use of retail purchases may improve the discriminative ability of claims-based approaches, these data alone appear inadequate for adherence prediction, even with the addition of more complex analytical approaches. Nevertheless, associations between retail purchasing behaviours and adherence could inform the development of quality improvement interventions.
Collapse
Affiliation(s)
- Alexis A Krumme
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Jessica M Franklin
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Danielle L Isaman
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Mufaddal Mahesri
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | | | | | - Gregory Brill
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Niteesh K Choudhry
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
23
|
|
24
|
Emergence of Cooperative Long-Term Market Loyalty in Double Auction Markets. PLoS One 2016; 11:e0154606. [PMID: 27120473 PMCID: PMC4847927 DOI: 10.1371/journal.pone.0154606] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 04/17/2016] [Indexed: 11/19/2022] Open
Abstract
Loyal buyer-seller relationships can arise by design, e.g. when a seller tailors a product to a specific market niche to accomplish the best possible returns, and buyers respond to the dedicated efforts the seller makes to meet their needs. We ask whether it is possible, instead, for loyalty to arise spontaneously, and in particular as a consequence of repeated interaction and co-adaptation among the agents in a market. We devise a stylized model of double auction markets and adaptive traders that incorporates these features. Traders choose where to trade (which market) and how to trade (to buy or to sell) based on their previous experience. We find that when the typical scale of market returns (or, at fixed scale of returns, the intensity of choice) become higher than some threshold, the preferred state of the system is segregated: both buyers and sellers are segmented into subgroups that are persistently loyal to one market over another. We characterize the segregated state analytically in the limit of large markets: it is stabilized by some agents acting cooperatively to enable trade, and provides higher rewards than its unsegregated counterpart both for individual traders and the population as a whole.
Collapse
|
25
|
Sobolevsky S, Sitko I, Tachet des Combes R, Hawelka B, Murillo Arias J, Ratti C. Cities through the Prism of People's Spending Behavior. PLoS One 2016; 11:e0146291. [PMID: 26849218 PMCID: PMC4743849 DOI: 10.1371/journal.pone.0146291] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 12/15/2015] [Indexed: 12/02/2022] Open
Abstract
Scientific studies of society increasingly rely on digital traces produced by various aspects of human activity. In this paper, we exploit a relatively unexplored source of data-anonymized records of bank card transactions collected in Spain by a big European bank, and propose a new classification scheme of cities based on the economic behavior of their residents. First, we study how individual spending behavior is qualitatively and quantitatively affected by various factors such as customer's age, gender, and size of his/her home city. We show that, similar to other socioeconomic urban quantities, individual spending activity exhibits a statistically significant superlinear scaling with city size. With respect to the general trends, we quantify the distinctive signature of each city in terms of residents' spending behavior, independently from the effects of scale and demographic heterogeneity. Based on the comparison of city signatures, we build a novel classification of cities across Spain in three categories. That classification exhibits a substantial stability over different city definitions and connects with a meaningful socioeconomic interpretation. Furthermore, it corresponds with the ability of cities to attract foreign visitors, which is a particularly remarkable finding given that the classification was based exclusively on the behavioral patterns of city residents. This highlights the far-reaching applicability of the presented classification approach and its ability to discover patterns that go beyond the quantities directly involved in it.
Collapse
Affiliation(s)
- Stanislav Sobolevsky
- Center For Urban Science And Progress, New York University, Brooklyn, New York, United States of America
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Izabela Sitko
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | - Remi Tachet des Combes
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bartosz Hawelka
- Department of Geoinformatics - Z_GIS, University of Salzburg, Salzburg, Austria
| | | | - Carlo Ratti
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
26
|
Bogomolov A, Lepri B, Staiano J, Letouzé E, Oliver N, Pianesi F, Pentland A. Moves on the Street: Classifying Crime Hotspots Using Aggregated Anonymized Data on People Dynamics. BIG DATA 2015; 3:148-158. [PMID: 27442957 DOI: 10.1089/big.2014.0054] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The wealth of information provided by real-time streams of data has paved the way for life-changing technological advancements, improving the quality of life of people in many ways, from facilitating knowledge exchange to self-understanding and self-monitoring. Moreover, the analysis of anonymized and aggregated large-scale human behavioral data offers new possibilities to understand global patterns of human behavior and helps decision makers tackle problems of societal importance. In this article, we highlight the potential societal benefits derived from big data applications with a focus on citizen safety and crime prevention. First, we introduce the emergent new research area of big data for social good. Next, we detail a case study tackling the problem of crime hotspot classification, that is, the classification of which areas in a city are more likely to witness crimes based on past data. In the proposed approach we use demographic information along with human mobility characteristics as derived from anonymized and aggregated mobile network data. The hypothesis that aggregated human behavioral data captured from the mobile network infrastructure, in combination with basic demographic information, can be used to predict crime is supported by our findings. Our models, built on and evaluated against real crime data from London, obtain accuracy of almost 70% when classifying whether a specific area in the city will be a crime hotspot or not in the following month.
Collapse
Affiliation(s)
| | | | | | - Emmanuel Letouzé
- 4 University of California-Berkeley , Berkeley, California
- 5 Data-Pop Alliance , New York, New York
| | | | | | | |
Collapse
|
27
|
Money Walks: Implicit Mobility Behavior and Financial Well-Being. PLoS One 2015; 10:e0136628. [PMID: 26317339 PMCID: PMC4552874 DOI: 10.1371/journal.pone.0136628] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/05/2015] [Indexed: 11/19/2022] Open
Abstract
Traditional financial decision systems (e.g. credit) had to rely on explicit individual traits like age, gender, job type, and marital status, while being oblivious to spatio-temporal mobility or the habits of the individual involved. Emerging trends in geo-aware and mobile payment systems, and the resulting "big data," present an opportunity to study human consumption patterns across space and time. Taking inspiration from animal behavior studies that have reported significant interconnections between animal spatio-temporal "foraging" behavior and their life outcomes, we analyzed a corpus of hundreds of thousands of human economic transactions and found that financial outcomes for individuals are intricately linked with their spatio-temporal traits like exploration, engagement, and elasticity. Such features yield models that are 30% to 49% better at predicting future financial difficulties than the comparable demographic models.
Collapse
|
28
|
Sapiezynski P, Stopczynski A, Gatej R, Lehmann S. Tracking Human Mobility Using WiFi Signals. PLoS One 2015; 10:e0130824. [PMID: 26132115 PMCID: PMC4489206 DOI: 10.1371/journal.pone.0130824] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 05/26/2015] [Indexed: 11/26/2022] Open
Abstract
We study six months of human mobility data, including WiFi and GPS traces recorded with high temporal resolution, and find that time series of WiFi scans contain a strong latent location signal. In fact, due to inherent stability and low entropy of human mobility, it is possible to assign location to WiFi access points based on a very small number of GPS samples and then use these access points as location beacons. Using just one GPS observation per day per person allows us to estimate the location of, and subsequently use, WiFi access points to account for 80% of mobility across a population. These results reveal a great opportunity for using ubiquitous WiFi routers for high-resolution outdoor positioning, but also significant privacy implications of such side-channel location tracking.
Collapse
Affiliation(s)
- Piotr Sapiezynski
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
- * E-mail:
| | - Arkadiusz Stopczynski
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
- Media Lab, Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Radu Gatej
- Department of Economics, University of Copenhagen, Copenhagen, Denmark
| | - Sune Lehmann
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark
- Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
29
|
Liu Y, Liu X, Gao S, Gong L, Kang C, Zhi Y, Chi G, Shi L. Social Sensing: A New Approach to Understanding Our Socioeconomic Environments. ACTA ACUST UNITED AC 2015. [DOI: 10.1080/00045608.2015.1018773] [Citation(s) in RCA: 334] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
30
|
de Montjoye YA, Radaelli L, Singh VK, Pentland AS. Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 2015; 347:536-9. [DOI: 10.1126/science.1256297] [Citation(s) in RCA: 273] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|