Bayesian Statistics and Data Science Aid in Disaster Relief


Data science makes all of our lives easier. It coordinates the fleet of Uber cars we take to work and tells us what to watch on Netflix when we get home. But it could also save your life. 337 natural disasters occurred in 2013, resulting in 22,542 deaths and more than $119 billion worth of damage, according to the Centre for Research on the Epidemiology of Disasters. Using data science, we can make those numbers much smaller.

There are two main lines of thinking when it comes to disaster relief. The first centers on what we can do to prepare ourselves before a disaster strikes. Data science is already used a great deal in the realm of disaster preparedness—we can use predictive analytics to build complex weather models that track tsunamis, hurricanes, earthquakes, and so on. But when a disaster strikes, one of the most important factors in recovery is the speed of information. How do we collect a large amount of data, how do we process it very quickly, and most importantly, how do we get it into the hands of the people who need it most?

Take the 2011 tsunami in Japan. We had some advance warning of what was going to happen, but it could have been better. The response has been good—we’ve ramped up the number of ocean buoys and ground sensors that collect information about wind and wave patterns and faultlines and stress. Data science handles the processing of that information. The issue then falls to getting information into the hands of people who need it.

It’s a hardware problem. Earthquakes are notoriously difficult to predict, but progress is being made. Seismologists are developing something known as the ShakeAlert system, which issues a public warning from the United States Geological Survey when an earthquake is detected somewhere the West Coast. When a magnitude 6 earthquake hit Napa, California last year, ShakeAlert issued a ten-second warning.

But ShakeAlert’s warning came in the form of a computer pop up. “I tried out the alert system on my laptop,” said UC Berkeley professor Joshua Bloom, “but realized that when my laptop was closed or when it wasn’t close by I wouldn’t be alerted.”

“I tried out the alert system on my laptop, but realized that when my laptop was closed or when it wasn’t close I wouldn’t be alerted.”

In response, Bloom hacked together a simple earthquake alarm for just over $100—a Raspberry Pi single-board computer, a wired speaker, a mini-WiFi adapter, and an SD card, all housed in a cardboard food container from a local restaurant. The alarm received the ShakeAlert warning and informed Bloom without him having to be on his computer.

Bloom’s alarm serves as a proof of concept for something he thinks should be a fixture in all potentially at-risk homes. We already have smoke and carbon monoxide detectors, why shouldn’t we have a disaster alert system as well? 10 seconds might not seem like a lot of time, but for many it can mean the difference between life and death. It could give something the chance to get to a safe part of their building, or maybe stop operating a piece of heavy machinery.

Collection is Key

We’ve gotten a lot better at predicting disasters, but unfortunately nailing the exact time something is going to happen is almost impossible. There’s an enormous number of variables in play, and nature is a thing we still don’t fully understand. Predictive modeling can only get so good, and no matter how prepared we are, disasters still happen. This brings us to the second methodology: what can be done after a disaster strikes.

In the wake of a disaster like the recent earthquakes in Haiti and Nepal, one of the biggest issues is getting aid into the hands of people who need it most. Infrastructure is down, and people have limited access to Internet, basic electricity, even water. The currency is information. You’re trying to find people, find supplies, direct aid—how do you coordinate and get people the information they need to act accordingly? It’s here where data science can help a great deal.

How do you coordinate and get people the information they need to act accordingly? It’s here where data science can help a great deal.

“You can apply a lot of the same things that we use to make our economies better,” said Ryan Orban, Galvanize VP of business operations and expansion. “Usually you’re looking at users and whether or not they’re going to churn or leave your site—you can do the same thing, but look at which people are going to be the most responsive to aid, or who are going to need it the most.”

Uber, for example, uses advanced Bayesian models to determine where to place its cars for demand and surge prediction. The same sort of algorithms can be used to determine where aid teams should be placed throughout Haiti or Nepal. Uber is able to create these models because it has an insane amount of data—real-time position of its customers and cars as well as a huge amount of historical data. The more data we can collect about aid efforts in progress, the better they can be coordinated and sent to the people who need it most.

“By putting this basic data collection in place, we can be much more prescriptive in how we deploy aid,” Orban said. “We can use advanced bayesian statistics and spatial analysis to determine what the spots are that need the aid, then use supply chain optimization to make sure things are moving along efficiently.”

Human Data

A huge resource in data collection across the globe is the power of the crowd. more than a billion people carry smartphones with them every day. When something out of the ordinary happens, the first thing many people do is take out their phone and snap a picture. Many of those pictures are then posted publicly, and many of them are geotagged with location information. That’s a wealth of data that in times of crisis can be used to help those in need.

That’s the idea behind Banjo, a global network that monitors the world’s data signals in real time. The company monitors the public posts of some 1.2 billion people across a variety of social media networks, from standards like Facebook, Twitter, and Instagram to foreign networks such as China’s Weibo and Russia’s VKontakte. Most importantly, Banjo maps everything it listens to—creating feeds of content based upon where the post is coming from rather than around a particular hashtag.

For example, when the Charleston, South Carolina shooting took place last month, Banjo was able to create a feed of posts coming specifically out of the city of Charleston, or the neighborhood where the Emanuel AME Church is located, or even the church itself.

Since Banjo is always listening to the world, it knows what a normal day sounds like—a baseline, so to speak. When something out of the ordinary happens—a blip or spike of posts that deviates from that baseline—Banjo takes notice. It’s algorithms then can narrow in on the area and figure out what’s happening more or less in real time—often long before media agencies have caught the scent. In fact, many of Banjo’s first paying customers are media outlets such as NBC and ESPN.

tornado-data-image 1000pxw

Banjo doesn’t just stop at social media either. “Just recently we announced that we’re expanding beyond the social and adding in more of the world’s real-time data signals,” said Banjo CMO Stacey Epstein. “By that I mean things like traffic, weather, point of sale terminals, financial data—eventually we’ll become the world’s only source of real-time data.”

Having all that data in a single stream could be an incredibly powerful tool to aid in disaster relief. As we discussed before, the National Weather Service and other organizations have gotten pretty good at knowing when and where, say, a tornado is going to touch down. What it’s less good at tracking is the level of destruction said tornado leaves in its wake. What’s the impact of the situation? What does it look like on the ground? How much damage is there? How many people are injured? These are questions easily answered by social media, and Banjo collects them into a useful stream that could then be used to help coordinate relief efforts.

“People will go to great lengths to show what it is they’re seeing, especially if it’s out of the norm,” Epstein said. “So in the case of a tornado, you’re immediately going to start seeing photos from around the scene that even the National Weather Service doesn’t have access to.”

“In the case of a tornado, you’re immediately going to start seeing photos from around the scene that even the National Weather Service doesn’t have access to.”

The Fight for Data

In order to use data science effectively, you first need access to data. Unfortunately, many of the world’s data streams are either quite difficult to parse or kept under lock and key. Take the case of California’s record-breaking drought—very much a disaster situation; much of the data that would be useful in tracking and battling the drought is difficult to access. People are incredibly protective of water rights, for example.

“I’m a big believer in open data,” Galvanize’s Orban said. “It’s something the government is finally starting to get around to, but there’s a lot of issues around data provenance, common data formats, making sure things are up to date, and so on. The city of San Francisco and state of California have made great strides in opening up their data, but they still need a lot of help in making it consumable and available in a way that is actually useful to the general public.”

Organizations like Code For America and other open data foundations are on the forefront of educating governments on how to make their data available. “If the data is out there,” Orban said, “it behooves us to make it available, and make it useful.”

“If the data is out there, it behooves us to make it available, and make it useful.”

“But there might be certain cases where spending $50 million to build your own network might be more effective,” Orban said. That’s the route some of the large tech companies like Google and Facebook have taken. They tried working with municipalities, but things were moving too slow, so they built their own alternative data collection apparatuses.

“Which is a shame,” Orban said. “It’s ultimately our data, and I think we have to fight for the right to access that data in a responsible way. If we continue to go down the path of continuously privatizing this data into the hands of large multinational corporations, it’s going to be much harder for us to claw that back.”

Interestingly, gaining access to data is an area that Banjo excels in. “We have valuable data to share,” Banjo’s Epstein said, “and the combination of our data and their data is incredibly insightful for both parties.” Financial trend data, for example, is usually kept incredibly secretive. But financial firms have a lot to gain by partnering with Banjo. When an oil pipeline exploded in Saudi Arabia, Banjo informed its financial partners hours before the media reported the explosion—giving them a considerable heads up on which way the market was going to shift.

This exact same data could be used help a disaster situation. If Banjo has access to the point of sale terminals in local businesses, it could track which Target stores are running low on water, or whether a local Wal-Mart is well-stocked in first-aid supplies. Meanwhile, images taken at the scene could help a hospital know whether it’s going to be dealing with more broken bones or burn injuries. Banjo itself isn’t in the disaster relief industry—it’s simply interested in building a platform to collect and analyze all this data—but the applications could save a lot of lives.