Podcast | 10.26.2020

Updated 10.18.2023

Caroline Buckee: Can Mobile-phone Data Help Control the Spread of the Coronavirus?

Anonymized location data can help guide strategies for protecting public health in a pandemic.

Caroline Buckee

Can cellphone technologies play a role in controlling the coronavirus pandemic? Knowing how public health policies interact with people’s actual behavior, even at an anonymous population-level view, can help guide the decisions of leaders. Mobile phone location data can reveal large-scale patterns of activity and travel between regions. In this episode, associate professor of epidemiology Caroline Buckee explains how such data—carefully stewarded to ensure individual privacy—can even be used to help predict where outbreaks are likely to flare next.

Harvard Magazine covered the use of mobile phone data in public health in the 2014 feature article, Why “Big Data” Is a Big Deal, and Buckee’s use of the technology to predict the spread of dengue fever in the 2015 article, Big Data Takes on Dengue Fever. During the coronavirus pandemic, she also discussed the parameters that control how rapidly the SARS-CoV-2 virus can spread.

Transcript from the interview (the following was prepared by a machine algorithm, and may not perfectly reflect the audio file of the interview):

A note to our listeners: this episode was recorded on September 22, 2020.

Jonathan Shaw: What are the best public health strategies for controlling the coronavirus pandemic? The answer may be different across countries, or even among adjacent rural and urban areas. How can governments know what is working and what is not? Could mobility data, anonymized information about people’s locations and movements point toward answers? Welcome to the Harvard Magazine podcast, “Ask a Harvard Professor.” I’m Jonathan Shaw. During today’s office hours, we’ll speak with Caroline Buckee, who is associate director of Harvard Center for Communicable Disease Dynamics and an associate professor of epidemiology at the Harvard T.H. Chan School of Public Health. During the past decade, Professor Buckee has used anonymized, population-level, mobile phone data to track the spread of malaria in Kenya and demonstrated its utility for forecasting dengue fever outbreaks in Pakistan. Currently she co-leads the COVID-19 Mobility Network, a coalition of infectious disease epidemiologists from more than a dozen universities, collaborating to understand the spread of the coronavirus pandemic. Welcome Professor Buckee.

Caroline Buckee: Thank you.

Jonathan Shaw: Which state and local governments have you been working with?

Caroline Buckee: So it’s changed over time as the pandemic has kind of waxed and waned in different areas. And of course, a lot of the work that we’re doing is international also, but within the U.S. we’ve been working with New York, Massachusetts, California, Washington, other states in the northeast, Florida, multiple different states and also at the city level. So we’ve been working on multiple local levels as well.

Jonathan Shaw: Since you’re doing this work internationally, are there any particular countries that stood out, in terms of displaying a difference in the way that the disease was spreading?

Caroline Buckee: Well, I think there’s been an enormous variation in the response to COVID-19, and I think some countries have clearly done a better job than others. And I think one of the things that distinguishes countries that have done well is that they have used technology and science in a smart way, and they responded very quickly. So places like South Korea, thinking about mobility data, they have been using very intense digital data that probably would not be acceptable in this country for privacy reasons. But they’ve done a good job of controlling the outbreak. And so I think it’s interesting to look outside the U.S. and see how other countries are managing. And it’s not always high-income countries either, countries like Rwanda have done a fantastic job, very efficient. And again, it speaks to the fact that the other component of this, in public health in general, is trust. So the public trust in the government and the public trust in public health agencies, and so on.

And so there’s this very interesting ecology between the public health agencies, people themselves, their behavioral responses, which we can start to measure using mobile phone data and so on, and how that translates into how the epidemic unfolds. So, I think it is very interesting and there has been really a very distinct difference between countries, in terms of how they’ve approached things.

The other thing that’s really different between different countries is that testing capacity has been really, I think, the biggest decider in terms of how well countries have done. So, countries that have done well, invested in testing early and they tested cases reliably, and turnaround times were relatively fast. And so, they were able to use a data driven approach to contain the outbreak. And in other countries, where it hasn’t gone so well, one of the distinctive features of those countries is that they have not tested well. And so, it’s been almost impossible to know where we are with the outbreak because the number of confirmed cases bears little relation to the actual number of cases circulating in the population. So, that surveillance difference has really changed the course of the epidemic in different parts of the world. And I think that’s been very interesting.

And again, there’s a balance between public health surveillance and state surveillance, and this blurring of the boundary between those two, now that we have these new digital tools. And I think there’s a really important discussion to be had about how we integrate private data, owned by companies and individuals into our public health approach for pandemic preparedness and response in the future.

Jonathan Shaw: And what types of mobile data have you been using?

Caroline Buckee: So, you can divide mobile data into multiple types. So, the most simple is called CDR, call detail records. And those don’t require a smartphone. And they essentially are stored by the operator, whoever you have a SIM card from. The operator stores data every time you use your phone and that data point is associated with a cell tower ID. So, if you know where the cell tower is, then you have an approximate location for the person at that time. And you can build up and infer mobility patterns from those kinds of datasets. We find those are very useful in low- and middle-income countries, where smartphone penetration isn’t as high or in places in rural communities and places like that. And then another kind of data that we analyze is from mobile apps, like Facebook or Google, things like that. And those data streams are more continuous because they are GPS locations.

And then, in those situations we work with the operator. So, the Mobility Data Network has been working with Facebook very closely. So, Facebook has a data team and they aggregate the data for us, to make sure it’s anonymous and safe. And then they provide us with kind of aggregated flows on a population-level of how people are moving around.

And then the third type of data, which is also mobile phone data, is from Adtech. So, this is advertisers that kind of pop up on your phone when you’re using an app. And that kind of data is slightly different than the others. It’s much more difficult to understand exactly what the subscriber base is, but that gets packaged by multiple different organizations, but gives you essentially the same kind of information, which is, for a given area, what is the denominator? So how many people are in that area at a certain point in time? And then, how are people moving around? So how many people in the data are moving between point A and point B over some time window?

So, all of these kinds of information are mobile phone data. We use them slightly differently. They have slightly different properties and we use them in a different context, depending on where we are. But those are the kinds of data streams we use.

Jonathan Shaw: Are there other kinds of data that could be used to track the spread of infectious diseases?

Caroline Buckee: Yes, absolutely. The other thing that I think has really been interesting to see for this outbreak, that I think will change how we view outbreak containment and pandemic preparedness, is the use of viral genomics and genomics in general. So, by sequencing the virus, which you can now do cheaply and quickly, you can start to understand how transmission is occurring in time and space. So you can, for example, identify the origin of a new viral lineage and link it to a particular importation event. You can start to date the time of emergence of the virus in a particular place and start to see spatial patterns of how the virus is spreading.

And so, one of the things I think is really interesting is how we combine mobility data from mobile phones, with viral genomic data, you know they can each validate the other method. They provide a completely different kind of data to look at some of these epidemic dynamics and they can provide quite concrete answers about, how things are spreading between places, how the epidemic has unfolded. And that’s something that we’re doing. For example, we’ve done a project in Bangladesh, looking at the genomics of samples across the country. And we’ve combined that with mobile phone data to show really how viruses entered the country and then emerged and spread across the country from there.

Jonathan Shaw: And what do these different kinds of data provide, in terms of insights into population-level movements?

Caroline Buckee: Yeah, again, it somewhat depends on the spatial scale of your question, right. So at the beginning of the outbreak, one of the things we wanted to know was, if you have an epidemic in New York, for example, what do we expect for people traveling and spreading the virus when they go? So that uses information about how many people travel between place A and place B. But then there’s another kind of information we might want within New York City, for example. And that’s about how are people moving around their neighborhood. So, what are local movements looking like. The reason that that’s important is that it’s an indicator, in near real time, of how people are behaving. So, when new policies are put in place to try and contain the outbreak, whether it’s school closures or lockdowns, you can actually see those behavioral signatures in the mobile phone data.

So what we’ve been doing is trying to understand the best way to analyze this kind of information and how that actually relates to transmission. Because of course, once it’s aggregated on a population-level, it’s not precisely the same as contact rates that spread the virus, it’s a proxy for that on a larger scale. So that’s how we use it. And the reason that it’s useful is because confirmed cases, hospitalizations and deaths are all significantly lagged with respect to the actual transmission event that led to them. So, we need indicators that give decision-makers more real time insight into what’s happening with transmission. And this kind of population-level behavioral data can really help us to understand that.

Jonathan Shaw: And how do you address privacy concerns about the data that you work with?

Caroline Buckee: This is a huge issue. We’ve been working on privacy protocols for this kind of data, actually, for a long time, for about a decade, since we started working with this kind of information. For obvious reasons, we don’t want to ever be able to identify individuals in the dataset. And so, from the mobile operator point of view, that’s a very important and highly regulated part of their business. And so, they are very conservative about that. And we’ve had very productive developments with algorithms for aggregation, to make sure that that’s really robust and really safe. For example, when we work in Bangladesh or Pakistan, the data never leaves the operator. It’s always behind the firewall. By the time we get it, it’s a very abstract matrix of movements between locations on a population-level. So we only know, for example, that 5,000 subscribers move between here and there, in a day.

So, privacy is really central to this whole aspect of the research. What I think is interesting this year is that we’ve seen really the flood gates open on this kind of information. And so, I think there are some urgent conversations to be had to make sure that we are protecting people’s privacy when we’re using this kind of data for public health. My sense is that after this pandemic, this kind of information will be routinely used for surveillance, for preparedness and for response to outbreaks. So, we need to make sure that the safeguards are in place to regulate and anonymize the data, to make sure that we’re not ever putting individuals at risk and also even communities at risk. So, in some places, there are significant vulnerabilities for some communities around racial, ethnic identity and so on. And we really need to make sure that we’re not blurring the line between public health surveillance and state surveillance. So that’s become a very important part of the Mobility Data Network and some of the work that’s happening in my group.

Jonathan Shaw: Has mobility data been helpful in assessing the effectiveness of social distancing measures, such as travel restrictions, work-from-home policies, limits on social gatherings and closure of schools, universities, and non-essential retailers?

Caroline Buckee: Yes, I think it’s been very, very helpful. Of course, its utility depends on the nature of the intervention. So, for example, you won’t be able to tell whether masks are working from this kind of information. However, for some of these other interventions, like working from home, travel restrictions and so on, you can dramatically see that there are changes in the data that you can monitor in real time. So, as we’ve been working with policymakers on the state and city level, we’ve found that they have requested this data on a daily level, on a daily basis, because they need to know what’s happening in the neighborhoods that they are trying to protect from the coronavirus. So I think it is very helpful for certain kinds of monitoring and it gives a sense for the overall picture. So, for example, we could see big spikes and movement around Memorial Day weekend, or we could see that there are particular travel routes that are going to be important.

We did a study where we looked at the prevalence of COVID antibodies in pregnant women around New York City. And we found that particular kinds of mobility correlated very, very strongly with seroprevalence in these pregnant women. So that was basically the neighborhoods where we could see commuting patterns, that’s out in the morning, back in the afternoon. Those were the neighborhoods that were hit hardest because that’s where the essential workers were living.

So, we’ve had validation that indeed these data do provide important insights into transmission. I think there’s an important question to ask in the aftermath of the big, severe lockdowns. Now, we’re starting to reopen, and it’s possible that the link between mobility patterns and mobility metrics in the data and the transmission that’s happening in the community, that that link might shift over time. So that’s something that we’re trying to understand and quantify, so that these methods can be used rigorously and in a standardized way moving forwards.

Jonathan Shaw: And what would the reason be for a shift like that? Would it be something like mask use?

Caroline Buckee: Right, so it could be that individual behaviors, that we aren’t able to measure, significantly changed transmission. And so, the only way that we could start to piece that together is if we can do ecological kind of studies, where there are natural experiments happening in one place. Mask wearing is common, in another, it’s not. They may have the same changes in mobility. Do we see differences in COVID-19? Things like that. But yeah, it gets harder below the resolution of these kind of population dynamics. It gets harder to tease out the actual mechanisms behind changes in transmission. So, I see this as a supplemental source of information that policymakers can use to keep an eye on things and make sense of what might be coming down the line, in two or three weeks.

Jonathan Shaw: Can you distinguish between, say, poor public messaging and poor adherence to rules, such as quarantines? For example, when the pandemic seemed to have struck the East Coast, but spared the U.S. heartland, it appeared that there was less adherence to social distancing in regions with less disease, despite warnings that an outbreak was imminent. But once the disease had taken hold and its seriousness became evident, local populations began to adjust their behavior, irrespective of official advice. Is that something that you saw?

Caroline Buckee: Well, interestingly, we saw sort of the opposite. If you look at the mobility data across the U.S., around the middle of March, when the national emergency was declared, regardless of where you were in the country, regardless of what the local policy was, we saw a massive drop in mobility everywhere. Of course, there was variation and there’s heterogeneity and that’s been a distinctive feature of this epidemic, spatial heterogeneity. But we saw the big drop, almost everywhere, happen in the middle of March, across the U.S.

Now, I think the reality is that different parts of the country have had their epidemic at different times. It was introduced in New York. We had a big outbreak in New York, Seattle, Massachusetts. It has subsequently spread to other parts of the country, including the south. So, at the beginning, when everyone was scared and we saw this big drop in mobility, this big behavior change, in places where there wasn’t much coronavirus, it’s understandable that people would have a different reaction to these stringent policies, compared to places like New York, where it was so evident and it was so obvious that the health system was struggling.

So, I think part of that is to do with having a very big country, that’s very heterogeneous and the epidemic hasn’t been at the same time everywhere. And so, part of that is about messaging, to the extent that the policies need to be very honest about what we’re doing, why we’re doing it, and what’s at risk. So I do think that there... Of course, we’re battling misinformation and politicization of these issues, but there’s also just the kind of inherent problem of having a very large, diverse, heterogeneous country with a staggered epidemic that has hit different communities at different times.

Jonathan Shaw: Is the spatial resolution of some of the data that you use good enough to be able to model rates of new contact between individuals? In other words, people who aren’t together all the time already, as you might see within families.

Caroline Buckee: Well, I don’t think so. There’s a distinction between this kind of data and the data coming from, for example, digital contact tracing apps, right. So, your phone can detect if there are other devices nearby it, and that’s the basis, of course, of these digital contact tracing apps or exposure notifications that have been developed by people like Apple and Google. That’s very different from the population-level aggregate mobility patterns that we’ve been analyzing. And I think serves a completely different purpose in the context of public health.

The thing that we can do with these population-level metrics is look at problematic places, in general, that require more surveillance, right? So if we have a large epidemic in a certain place like New York, we can measure how well connected New York is to other places. And make the case that you need to really start testing and doing surveillance in city XYZ. And I think that that is important.

Another general phenomenon that we’ve seen across the U.S. is the emptying out of cities. So, we saw that the lockdown came and especially in wealthier suburbs, for example, and in commercial districts, we’ve seen a complete emptying out, where people have gone to rural areas. And that’s happened across the board, with places like Boston and Atlanta losing 10% to 15% of their population in a month or two, as people left the cities. So to the extent that that represents new contact between people who were in the epicenter and other populations, yes, we can look at that. On an individual level, no, we don’t go down to that resolution.

Jonathan Shaw: I see. And what about contact tracing? How are the ethical and privacy concerns balanced in that type of situation? Wherein a person who may have unknowingly shared a subway car or an elevator with an infected individual, has an interest in knowing that that has occurred.

Caroline Buckee: Well, so I think the privacy issues around the contact tracing apps are very different. First of all, generally these are opt-in. So, they’re consent-based platforms that people opt into. So that already is a very different situation than these passively observed data from mobile phones, for example. But also, the exposure notifications that have been developed, for example, the Apple, Google app, those are very conservative about privacy. To the extent that everything is stored on the handset and location is not stored. So, the issue there, is that potentially that’s not so useful for contact tracing, traditional contact tracing is quite intrusive. You learn everything about where that person’s been. You take names and phone numbers, that’s the whole point of it. And so, I think some of these apps are still really in the trial phase. We don’t know how well they’re going to work. We don’t know how many false positives, so spurious exposure notifications people are going to receive and what that means for uptake and continuing use of them.

So, I think this is a difficult space because traditional contact tracing, as part of an epidemic containment strategy, is a necessary and intrusive part of the health system. Your mobile phone and your personal data and apps are not necessarily. And we think about them in different ways. And indeed, that kind of data is different. The question about who owns the data? How is it managed? Who provides it? Does anybody pay for it? All of those questions are really important, in a way that traditional contact tracing data doesn’t even have to deal with.

Jonathan Shaw: How rapidly can mobility data provide insights? For example, if patterns of movement changed over the Fourth of July, how quickly is that data available, so that leaders can take preemptive measures to prevent the sparks of new outbreaks from spreading?

Caroline Buckee: Well, theoretically, it’s real time. But, of course, you have to aggregate it and process it in a way that provides insights. When we were working at the beginning of the outbreak, we were generally providing our operational partners with daily updates. So it would be every day. And of course, for something like the Fourth of July, we would be ready to go quickly because we know that holidays create a lot of travel.

Jonathan Shaw: Hmm. How can mobility data be useful in forecasting the spread of COVID-19?

Caroline Buckee: So, I think forecasting is very challenging, for any disease. And I think it has been particularly challenging for COVID-19, not only because this is a new infection. So, at the beginning of the outbreak, we didn’t know the basic epidemiological parameters with which to inform our model structures. But subsequent to those early days, I think forecasting has become quite difficult because so much of the epidemic is driven by social and political decisions and behaviors. And so, that’s made it quite hard to predict because it’s no longer simply a biological, epidemiological process, but also a complex sociological one.

So, it’s been challenging. And I think the best forecasting approach has been the ensemble approach that the CDC has coordinated through colleagues at UMass Amherst, they’ve taken lots of different forecasts and tried to come up with a sort of a consensus for what might happen, with uncertainty. So the reason that mobility data is useful for those efforts is because it provides you with, essentially, the spatial dynamics that leads to the epidemiology. So connectivity between places, general behavior, and those are things that you can put into your mechanistic model to help inform it and help make predictions about what might happen next. Either way, it’s a challenging problem and forecasting out more than a couple of weeks, is essentially impossible right now, I think.

Jonathan Shaw: In Boston and across the country, for example, the limited reopening of local colleges and universities will presumably lead to higher rates of infection and disease. Are you seeing any trends that suggest a renewed outbreak this fall?

Caroline Buckee: I think there’s no question that college students traveling across the country, coming together, living in dorms, having parties will lead to outbreaks and potentially renewed community transmission. I think the open question is the extent to which universities can contain them. Often, the college is somewhat separate, socially, from the rest of a community. And so that means that there is hope that a university can try and contain local outbreaks, which will happen. But try and keep them to the college and prevent it from spilling out into widespread community transmission. I think we have seen those outbreaks across the country in university settings. And inevitably, some of those will spill over and we’ll see rising cases in some situations. I suspect that it’s not the same as it was back in February and March, people in many places are wearing masks, we have got restrictions on behaviors and where we go and what’s open and what’s not.

So, my hope is that we won’t see resurgence in a really problematic way. But one thing to mention there is that we don’t really know the extent to which seasonality is going to make this much harder for us. Other coronaviruses that circulate are highly seasonal. And so, there’s a possibility that winter is going to be very difficult and that there will be large outbreaks that happen because of seasonal effects and because people are inside and they’re unable to go outside and socialize in parks and things. And that will inevitably lead to more risky behavior. So, I would say there’s still huge uncertainty, but the potential for outbreaks spreading to the community from colleges is certainly there.

Jonathan Shaw: You’ve used mobile phone data in the past to model the spread of diseases such as malaria and dengue fever. Do you anticipate that this kind of data might have utility for tracking the spread of other infectious diseases in the United States, or is the current pandemic a special case in that respect?

Caroline Buckee: I’m convinced that we will be using this kind of data as a public health data stream moving forward. There’s no question of its utility. It will have different kinds of utility for different kinds of pathogen. So for example, if there was an outbreak ofHIV or something like this, for a sexually transmitted disease, it’s not going to be used in the same way because the contact patterns are different and the way people behave is less, obviously, linked on a population-level to the spread of HIV, for example. For something like West Nile virus or a vector-borne disease, the ecological side of the mosquito biology is going to play an important role. So you might use it differently for that reason. But in terms of being a quantified, near real time, data stream, that tells you where everybody is and where they’re moving around, that’s useful for every infectious disease. So, I think now that we’ve had COVID and people have started to understand the utility of this kind of approach, we’re certainly going to see this continue to be used by academics as well as public health agencies.

Jonathan Shaw: Thank you very much for joining us today, Professor Buckee.

Caroline Buckee: Thank you so much for having me.

This episode of Ask a Harvard Professor was hosted by Jonathan Shaw and the season is produced by Jacob Sweet and Niko Yaitanes. Our theme music was created by Louis Weeks. This third season is sponsored by the Harvard University Employees Credit Union and supported by voluntary donations from listeners like you. To support the podcast, visit harvardmagazine.com/supportpodcast. If you enjoyed this episode, please consider rating and reviewing us on Apple Podcasts. Contact us with questions at harvard_magazine@harvard.edu.