Please wait while the transcript is being prepared...
0:04
Okay. So first of all to get started,
what is big data? You hear this all the time.
It's a buzzword. There's a lot of talk about it,
but let's be clear it and provide
some background about what we're actually talking about.
People would usually characterize big data with
at least these three characteristics, some people add a fourth.
But first, we're going to say the volume.
There's so much more data now than it was before.
Every year there's more new data created than existed in the entire history until 1997.
We have so much data now,
and it's from so many different sources that historically would've been a problem.
Handling and analyzing this data
becomes an issue in and of itself because there's just so much of it.
We're talking about billions of data points.
That's number 1, volume.
Also, second point would be velocity.
Not only is there an incredible amount of data more than there wasn't the past.
It's also coming in very quickly.
Think about how often data streams are always coming in.
There's new data all the time. Twitter feeds, RFID tags, sensors and smart metering.
This data is not only in incredible volume,
but it's coming in very quickly.
Again, accommodating the speed and amount of data
is historically been a big challenge and something that people are working on now,
and there's some exciting developments there in.
The last characteristic would be variety.
There's so many different kinds of data now.
With the Internet and the proliferation of the Internet.
We have everything from structured numeric data that comes in your database form,
rows and columns and that's what we like and it's easy to analyze.
All the way down to emails and
qualitative data and just things like that have to be processed.
And somebody has to go back and take from that what actually can be used and measured.
So, the volume, velocity, and variety of data this is what it generally refers to big data.