Skip to main content

Big data collection and dataset cleaning

Published on August 31, 2017   9 min
Hello. This presentation is about data preparation, also sometimes referred to as data cleaning or scrubbing or less euphemistically, as data janitor work. I'm Matt Wong, CEO of Liquidaty, a New York City based startup company.
Often the biggest challenge when working with data is not its availability, quality, volume, or accessibility, but rather is the arduous task of preparing the data before it can be used for the particular purpose at hand. To address this challenge, corporations may resort to hiring full-time employees whose sole job is to prepare data that resides in one place so that it can be moved to another place for processing. This activity in general is what I will refer to as data preparation. So what exactly does that mean? Well, you've probably heard of new fields in technology commonly referred to as big data or machine learning. And one thing that these new technologies have in common is that generally they consume vast quantities of data. If there is one thing that you take away from this presentation, remember this: Just as we humans prefer much of our food to be cooked before we eat it, data consuming technologies typically need their data to be prepared often in a very specific manner before it can be consumed.