Ralf HallerRalf Haller December 9, 2011

“BIG Data” to collect BIG bugs

Yesterday IBM announced the acquisition of DemandTec for $440 million. The company has hundreds of customers such as in retail and government helping them analyze and derive conclusions from massive amounts of company-collected data.

“Big data”  as the name suggests has to do with masses of data collected from just about everywhere. We are talking terabytes and more of data collected from all kinds of sources such as information-sensing mobile devices, aerial sensory technologies, cameras, microphones, RFID readers, wireless sensor networks, social media, buying patterns and so on. The fact that 90% of the data in the world today was created within the past two years makes this an even bigger challenge as it is a constantly moving target. Currently used relational databases and desktop statistics/visualization packages cannot deal with this unstructured data, requiring instead massively parallel software running on large computers, and often grids of dozens, hundreds or even more of servers.

Its not surprising that Google has been into this for quite a while since their search algorithm is doing exactly what big data is all about: collecting massive amounts of data and making decisions (search) based on analyzing it. Google is offering access to computing power even to enterprises now with its service BigQuery. More also in this Google blog.  Google was also there right at the beginning with its framework MapReduce that was then used in projects by others such as Yahoo, leading to Hadoop, a story on this you find here.

Data collected are e.g. from web logs; RFID sensor networks; social networks; social data, Internet text and documents; Internet search indexing; call detail records; astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and/or interdisciplinary scientific research; military surveillance; medical records; photography archives; video archives; and large-scale eCommerce.

As IBM states: Big data spans three dimensions: Variety, Velocity and Volume.

A McKinsey report on Big Data mentions e.g. these points:

  • The use of big data will underpin new waves of productivity growth and consumer surplus. For example, we estimate that a retailer using big data to the full has the potential to increase its operating margin by more than 60 percent.
  • The computer and electronic products and information sectors, as well as finance and insurance, and government are poised to gain substantially from the use of big data.
  • Policies related to privacy, security, intellectual property, and even liability will need to be addressed in a big data world.