Big Data has been the technology industry’s favourite buzz word over the last year. But despite the hype, there is plenty of reality here and it is safe to say Big Data is here to stay. Should we not step back and ask ourselves what ‘Big Data’ really is in the first place? We need to take a closer look at what Big Data represents in practice and what Big Data can do.
Big Data is best described as the ability to process and structure large quantities of data in a resource-efficient way so as to derive quality and usability from the data itself. According to IBM, 90% of data in the world today has been created over the last two years. A BBC News report from March 2014 suggests that, in 2012 alone, 2.5 gigabytes bytes of data was created in the world every day.
Big Data is often defined as any quantity of data that is too large to process and analyse through traditional database management tools and data processing applications. So we know that Big Data can be said to be the analytics process through which we can extract value from huge quantities of data. It allows us to derive meaning from lots of unstructured and structured data.
By way of an example, imagine that I am a large retailer which has collected masses of data on the buying behaviour of my customers and I am keen to use that data to understand their buying trends. I can use Big Data analytics to break down and categorise that data so that I can extract meaning from it. In particular, I can use a Big Data approach to determine how many of my customers shop on the weekends rather than weekdays and, out of those customers, which of them likes my end-of-day discounts on fresh produce. I could use Big Data to drill down even further, using the extrapolated data to see which of my customers like to buy fresh uncut carrots from aisle 3 between the hours of 5 pm and 8 pm on any given day.
From this rather basic example one begins to see the way in which the use of Big Data affects us all and its potential benefits.
What is fast becoming clear is that new types of data sets exist today which we might call ‘sentiment data’ (eg regarding consumer behaviour, what leads a consumer to choose to buy one product rather than another), clickstream, sensor/machine, geographic data, server logs and unstructured data such as texts, videos and pictures. These are not traditional data, so businesses are exploring these data streams to try and create structure in them in order to make them more usable and therefore derive use and profitability.
So what if I bought wine?
One may still wonder what the fuss is all about. When your ISP puts together your data usage gathered from your weekly TV viewings and then, in partnership with one of the world’s largest e-mail service providers that routinely scans e-mail content for the purposes of behavioural marketing, sends you targeted e-mails that already seem to determine your shopping trends, particularly your propensity towards the odd Laurent Perrier (which you happen to have purchased as a last-minute gift), you may wonder how this happens. This is Big Data and Customer Relationship Management at work behind the scenes, assimilating customer behaviour and trends.
The daily impact
Big Data can be used in a myriad of other ways: it can be used by fund managers when making pension fund selections; it can be used by search engine providers to aid as well as uncover our Internet surfing habits. The ramifications of processing mass quantities of data so efficiently are extensive and affect us on a day-to-day level.
The significance of Big Data is further demonstrated by the recent report by the US government, ‘Big Data: Seizing opportunities, preserving values‘ (published May 2014). This report examines the various uses of Big Data and its impact on economies, briefly touching on the fact that, as with all technology, its use can produce what I refer to as the ‘yin’ and ‘yang’ effect – where some use may be considered controversial. The use of Big Data is only just gaining momentum and its application to understanding weather patterns, predictive medicines and, in certain US states, understanding an individual’s propensity to commit crime goes to show the positive if occasionally controversial uses of Big Data.
Big Data in the UK
It is clear that businesses and consumers in the fast-paced world that we now live in crave a constant flow of information and Big Data is one option that solves that desire in the background. Big Data platforms are helping to filter the huge volumes of data that UK-based businesses and consumers are increasingly exposed to. The example below helps to clarify the concept of Big Data in the UK.
The NHS
We all know that the NHS has been trying to centralise UK patients’ data for a number of years and, at some point or another, individuals registered with a GP surgery should have received a letter informing them that patient data is being placed on a centralised database. We are told that the centralised database will make it easier and safer to treat patients as each patient’s data will be accessible by medical staff across the country.
In practice this should mean that if, for example, someone was in a motor accident and needed immediate medical attention, medical staff would be able to pull up that person’s medical records and treat them – fully aware of any pre-existing medical conditions. As seamless as the mechanics of this process sound, Big Data has a part to play here. The crunching, analysis and categorisation of data occurs behind the scenes so that the data presented is structured and useful to the clinician/medical person accessing your patient data. Let’s imagine only 1 million Londoners consent to their data being added to the NHS’s centralised system, and that of those 1 million Londoners, 1 person a day suffers an accident – the amount of data that needs to be processed efficiently and effectively so as to deal with those patients is phenomenal.
Other benefits arise from the centralisation of patient data. For example, centralisation of patient data helps to reduce the risk of patient data loss or misuse, whether accidental or otherwise. This is simply due to the fact that it is easier to control the repositories of such large data sets through the imposition of rules on what one can or cannot do with and to such data. The converse is what we have today – where each GP surgery and hospital maintains responsibility for patient data, with differing rules on access, transfer, copy, transportation etc. of such data.
Data Quality
Should one be concerned about Big Data at all? Like all software, the standard saying of ‘garbage in, garbage out’ applies. Often the quality of the data being subjected to processing may have its own flaws in terms of how it was obtained or whether it was properly categorised to produce the results required. Where organisations house vast quantities of data, it is the organisation’s responsibility to ensure that the sanctity of that data is properly assessed and the data controller must remain compliant with applicable data protection legislation.
Behind the scenes
Data analytics has been around for decades but has gained momentum over the last few years as a result of larger computing power (for example the ability to outsource the storage of mass data to hosting companies) and more efficient technical processes available through technology such as open source software. Data analytics has evolved through these developments thereby lending strongly tested and stable environments for the processing of mass quantities of data. Many businesses are now using platforms with scalable resource and large computing power to allow them to explore the huge quantities of data now available to them and to structure such data prior to moving it to their data warehouse.
The extensive power of Big Data becomes more discernible and immediately the privacy considerations also become more evident.
What thoughts should one give to privacy where Big Data is concerned? This is a subject for a different article, although it is still central to the total argument presented here. Suffice it to say then that there is sufficient legislation regulating the protection of personal data and, to the extent that one can identify the ‘technical and organisational measures’ put in place for the processing of such data, the privacy angle in reality is no different to any other. At the risk of being repetitive, Big Data is here to stay and the reality is that we need to think about building the use of ‘Big Data’ into our IT strategies today.
Lillian Pang is Legal Director at Rackspace®, the global leader in hybrid cloud and founder of OpenStack®, the open-source operating system for the cloud.