Big Data - the next Big thing..?
So what is Big Data?
Every day an increasing amount of information is gathered and stored by all manner of websites, companies, organisations and products. We record data on everything from weather stations to shopping habits, social media to traffic trends. The volume of this data is growing at an alarming rate as we develop more complex programs for recording data down to the smallest detail for analysis, reporting and business intelligence, and the type of data that we can record is also evolving.
Every day, we create 2.5 quintillion bytes of data
The growing need for "trending" means that not only does an individual customers details now need to be stored, data for all customers is now required to be accessed quickly and analysed as a whole, to find trends and patterns across the whole collection of data whether that be thousands, millions or trillions of values to be analysed
90% of the data in the world today has been created in the last two years alone
Structured/UnStructured Data
Data now also does not only mean values and figures that can be represented in spread sheets and graphs as previously favoured. "Data" can now be stored as images, documents, emails, text messages, audio, video, etc.. These various data types are commonly known as “non-structured” or “unstructured”, ie: unable to be stored and analysed in a typical relational database. This is the real issue faced by todays companies, the data they have stored is not able to be analysed in the traditional way, but simply ignoring this data would be foolhardy
Many organizations are becoming overwhelmed with the volumes of unstructured information — audio, video, graphics, social media messages — that falls outside the purview of their “traditional” databases. Organizations that do get their arms around this data will gain significant competitive edge
For example in the past medical records would be held in vast paper files containing Xrays, reports, doctors notes, etc.. Now those Xrays are recorded electronically and can be accessed from a doctors PC without the need for any physical document store. One hospital will store and record thousands and thousands of Xrays, one hospital group will have millions. Big Data is the means of retrieving this information easily and comparing and analysing this data in a way which has never been possible previously
“We are only looking at what we have in our data warehouses, it’s not going to be enough for us to get the insights that we need. If you’re a retailer and you were not using all the information you could to judge your customers’ buying patterns, then the retailer across the street probably will, and they’ll steal your customers. That’s the realization, I think, that drove a lot of people to think that they should be capturing much, much more”
Data Timeline
The other demand for data is timeliness. The demand for “real-time” data is increasing, data that is 24 hours or even 1 hour out of date could be the difference between a successful business decision. The accuracy of the information received often depends on the age of the data analysed, for example a hotel needs up to the minute room allocation data in order to receive new customers. If this data was 10 minutes old then a room could be allocated twice, or customers turned away when a room is actually available. For Credit Card companies the ability to recognise fraudulent activity quickly and stop the stolen card saves them many thousands of pounds
Real-time analytics is the big demand. The Holy Grail is “getting and making effective use of information as it happens.” Whoever can crack this will be the forerunner
'Data delayed' is 'Data Denied'
Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business
So now we understand that “Big Data” is the means of accessing and analysing large quantities of data regardless of size, format or timestamp
The next question is how do you compare millions of images to find trends and patterns? Originally in a "structured" database this would not be possible (useable), as a relational database is designed to analyse data, not images. For the latter we need a platform to query "non-structured" or “semi-structured” data. New technologies are now emerging for the storing and analysing of this large volume structured and non-structured data, but it is still early days
Platform:
NoSql (short for Not Only SQL) is the term used to describe non-relational databases that can handle both structured and unstructured data. There are several contenders for big data databases, some use no SQL at all while others utilise certain areas but avoid joins, etc..
Contenders:
Apache Cassandra is the most common NoSQL database, as used by Facebook
Apache Hadoop (to be integrated with SQL Server 2012 – more on this in the next blog)
SimpleDB
Google BigTable
MapReduce
MemcacheDB
Voldemort
It has been predicted that “Organisations that will leverage the new data types will outperform their peers by 20% by 2015” - If you ignore big data, your competition will not
Once Big Data takes off the possibilities opened up will create a lot of changes. New cars fitted with “smart boxes” recording data about your driving could be accessed by Insurance companies, with lower premiums offered to those drivers who can prove they are a safer risk. Medical data compared for millions of patients to identify high risk groups for certain illnesses or diseases, and tests developed to pick up the warning signs earlier for those most at risk. GPS data analysed to predict traffic volume and flow at peak times, Event data built into our satellite navigation to warn of possible delays. Live streaming from traffic cameras to our iPhones’. Facebook photos used to identify benefit fraudsters
The realization that time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologies