IBM Using Big Data to Fight Food-Borne Illness

International Business Machines (IBM) Corporation is leveraging the power of Big Data to fight food-borne illness.

Each year, roughly 1 in 6 Americans (76 million) are infected with some type of food-borne illness, according to the Centers for Disease Control and Prevention (CDC). While most people recover without any lasting complications, others aren't so lucky. Of those 76 million annual food-borne illness cases, roughly 5,000 are fatal – and that's only counting cases in the United States. Globally, food-borne illness causes even more hospitalizations and deaths.

How IBM is Fighting Food-Borne Illness with Big Data

But IBM is hoping to further protect the public from food-borne illness by using Big Data to analyze retail scanner data at supermarkets and grocery stores, comparing it to geographical locations in which cases of food-borne illness have been identified.

In its report titled “From Farm to Fork: How Spatial-Temporal Data can Accelerate Foodborne Illness Investigation in a Global Food Supply Chain,” IBM researchers cite the 2011 E.coli outbreak in Europe, which affected some 4,000 residents, 50 of whom died, across 16 European countries. Researchers say that it took government officials nearly two months to identify the source of the illness. If officials were able to identify the source sooner, perhaps they could have protected the public from E.coli by issuing a mandatory recall on the tainted Egyptian fenugreek seeds.

The good news is that Big Data can mitigate the damage of future food-borne illness cases by better detecting the source of the illness, as described in IBM's report. The general idea is to identify spatial information of food products and regions in which food-borne illness has been reported, cross-referencing the data to look for patterns and similarities.

IBM researchers say the process is a “computational technique” that offers two primary functions: first, it helps by identifying possible sources of contaminated food early, allowing governments and food companies to issue the necessary recalls. Secondly, it can make predictions on where contaminated food is likely to occur before the outbreak even begins. This one-two punch should better protect the public from food-borne illness.

Spatial information of each component in the food distribution and supply chain can be used to define a network relationship between sources of contaminated food, wholesalers, retailers and consumers, and subsequent public health case reports,” explained IBM scientists. “In this study, we demonstrate a new approach to accelerate the foodborne illness outbreak investigation. It is a computational technique that can 1) help identify possible sources of contamination in the early stages of a disease outbreak, or 2) make pro-active predictions on likely contamination sources before the onset of a potential outbreak.”

The average supermarket carries approximately 42,214 products, according to the Food Marketing Institute (FMI). Whenever a customer buys a product, it's scanned by the respective supermarket, which then logs the data. IBM's goal is to analyze this massive amount of data, checking it against cases of food-borne illness throughout the country. If several regions have experienced the same type of food-borne illness outbreak, IBM will look for patterns such as grocery stores that sell the same variety of food.

While this bold new initiative may help to reduce cases of food-borne illness, researchers say it's not going to make traditional methods obsolete. Government organizations and communities will still need to conduct interviews and surveys to better identify the source of outbreaks. With that said, these traditional methods can be complimented with IBM's Big Data analytics approach to narrow down the list of possible contaminants in just hours. Who knows, perhaps this could save hundreds or even thousands of lives across the globe?

IBM Partners with CognizeR

In other related news, IBM and Columbus Collaboratory have partnered with the open-source R extension service provider CognizeR to enhance its Big Data analytics. So, IT organizations that use the R programming language can now access IBM's Watson services more easily from the native, default environment.

Companies and organizations that use IBM's Big Data analytics Watson services (e.g. Watson Language, Personality Insights, Tone Analyzer, Speech to Text, Text to Speech, Voice Recognition, and Translation of Visual Recognition) have typically been required to use API calls which are hard-coded in Java or Python. With CognizeR, however, organizations can access these services straight from the R environment, facilitating the otherwise complicated process of Big Data analytics.

As you may already know, Watson is IBM's Big Data-powered artificial intelligence computer. Developed by IBM's DeepQA project and led by researcher/engineer David Ferrucci, Watson made headline news when it beat competitors Brat Rutter and Ken Jennings in the game show Jeopardy, for which it received the first place prize of $1 million. Watson was given access to a massive amount of data, consisting of four plus terabytes of structured and unstructured data.

According to a press release issued by IBM, less than 1% of all data is properly analyzed. Many IT organizations harvest and collect data, but few use this data to its full capacity. When data is not properly analyzed, organizations can not effectively use it. IBM says its new CognizeR extension will improve Big Data analytics by offering enhanced predictive models along with the ability to simply analyze more data.

IBM and Big Data

IBM is investing heavily into Big Data projects. From its use of Big Data analytics to identify sources of food-borne illness, to its new CognizeR extension, the multinational company continues to push the boundaries of what's possible with Big Data. So, what do you think of IBM's bold approach to fight food-borne illness with Big Data analytics?

Thanks for reading and feel free to let us know your thoughts in the comments below regarding Big Data.