Head of Engineering
Interview by Kamran Rafiq (ISNTD) Oct 2013
Patrick Copeland, head of engineering at Google, speaks to the International Society for Neglected Tropical Diseases about Google's involvement in NTDs, global health and emergency response.
With the mobile revolution dramatically empowering users all around the world, particularly in extremely resource-poor settings, what are some of the innovations which Google has developed in terms of disease surveillance, emergency response and global health?
By our recent estimates, mobile usage has surpassed that of desktop. In the next 3-4 years the internet will double and more than half of the traffic will be delivered via wireless. Better and cheaper devices, rich cloud applications, social media, and emerging markets are all drivers of the trend. Juxtapose this with fact that Internet access is out of reach for a vast majority of people in the world. There are over 3.5b people in countries where mobile data price is 10x of US on a GNI basis.
Google is interested in making the internet available to more people. We believe that access to information improves their lives and creates economic opportunity. Over the last few years we've been publicly and quietly investing in several areas, among them: Google Fiber, experiments on the TV spectrum in South Africa, project Loon, and grants to nonprofits working to expand Internet access.
Another example is our investment in Crisis Response (google.org/crisisresponse). Our goal is to provide essential and high quality actionable information to >2B citizens for all significant crisis events, through better tools and preparedness. We've integrated with many country’s CAP (common alert protocol) to amplify vital information across all Google properties and platforms. Sometimes government information is too slow in these situations, so we’re building tools for those affected to actively contribute. The idea is to allow people near a crisis to provide real-time information and organize their own responses using our tools.
What were the main drivers behind focusing specifically on dengue fever?
We started the project with a focus on Flu surveillance with a methodology that works well on endemic seasonal diseases. We also started on flu because the queries are high volume and the area is broadly interesting to many people. For "unexpected" disease outbreaks, query volume tends to be driven mainly by media and thus is very hard to predict based on trends.
We selected dengue fever as it is mostly an urban disease, as opposed to malaria or cholera which have a similar endemic profile but tend to be a lot more rural. Since queries in the developing world come mainly from cities, dengue was a particularly good fit. In addition, our criteria for including more diseases is to look for: 1) a consistent volume of search traffic from many countries and cities, 2) a ability to identify a pattern of seasonal related searches, 3) a large at risk population, and 4) an illness where prevention is a primary response.
Given the susceptibility of the algorithm used within Dengue Trends (DT) to spikes in search query volume, what improvements have been made to DT in terms of countering this?
The original algorithm was built to dampen short term spikes, sudden changes in any one location, and massive numbers of reports from any one individual. We use a heuristic that roughly says that very short term changes are unnatural and are removed from the estimate. If the window is too large we flatten real signals, and if it’s too short we will miss some spikes. During the last flu season we saw a long term sustained spike that fell outside of our window of detection and we assumed it was a signal.
The paper has an interesting chart that shows the historical media coverage against our error rate. Overall our response has been less about handling spikes in query volume (which we do fairly well) but adapting to season-long behavior change from: a) a flu that behaves differently and b) social effects on how people treat and describe their flu. This is an area of ongoing research. Since 2009 we’ve made several improvements:
Timing of flu outbreaks have changed. 2009 H1N1 started during Northern Hemisphere summer
Since 2009 we've included more query terms to better fit the changes we've seen in recent seasons
Complications from H1N1 were different than typical flu, so search terms changed
Health-seeking behavior (ie doctor vs hospital) change.
Is there more that existing disease surveillance and outbreak control centers at country level can do to perhaps assist in terms of mitigating spikes in query volume which might affect the predictive quality of DT? Is there room for potential collaboration with Centres of Disease Control across the globe? and if so what shape or forms would it take?
Today we use CDC-ILI, doctor visits as our source of truth. We’re considering including hospital visits, and other measurement that covers a broader range of demographics. Countries that have real-time reporting systems will have an advantage in getting ahead of emerging disease trends. We’ve heard stories about several governments creating aggregation systems that allow them to make predictions similar to Google Flu Trends.
In general, we've very interested in finding ways to share our data and allow others to replicate and challenge our ideas. We think an interesting area of research could be in pattern detection over time. In other words, looking at how disease spreads over time and location using similar digital techniques.
We also believe that social signals are an interesting direction. Our model was built against an implicit "person" (though still anonymous, aggregate). They tell us "I feel sick”. New research can use a combination of personal data, media, and social signals (twitter, facebook, google+, and user reported). Connecting our trending data to a social graph may tell us immediacy at small scale and trajectory. An interesting application in this area is Flu Near You.
We have insecticide resistance mapping featured at ISNTD Bites 2013 - which areas aligned to NTDs but outside of straightforward disease mapping are of interest? Are you actively looking for collaborators?
Perhaps. We get many requests for collaboration from governments, research institutions, and individuals building apps (not just in the space of disease outbreak). We think there’s a huge potential in digital surveillance of other areas. (e.g. climate, violence).
Our current goal is to find ways to liberate our data further, to make it easier to reproduce our model and to improve upon it, and to make correlation exploration easier. We currently allow researchers to download data on a daily basis from Flu and Dengue Trends.
But there are a few limitations: a) currently we decline to share search terms due to concern that public knowledge of those terms may make the model susceptible to gaming, and b) Google Correlate does not provide regional correlation. Part of the reason is due to strict privacy protection on Google's part. The narrower the region is the easier it is to identify an individual's search habits. We're interested in easing these limitations but that could also mean changing our algorithm. Stay tuned...
Which disease-states is Google likely to move into?
We are open to expanding coverage of Flu and Dengue Trends. We've had some government level discussions. One challenge for us is “scaling” to meet all of the needs. The discussion often takes a direction of more collaborative work, and that depends on having a more open data model (see previous question). In addition we receive requests to look at specific baselines. We are interested in developing a framework where everyone within the same baseline equivalence class can share results and leverage each other's data more easily.