Skip to main content

Structuring big data

Published on April 30, 2017   25 min
Hello, my name is Gary Templeton. I'm an Associate Professor in Business Information Systems at Mississippi State University. This discussion is about Structuring Big Data. It is about downloading archival data or whatever type of data you have and looking into the data to see its characteristics, to see if it is going to be a hindrance in your analysis.
It's important to note that the goal of any research project is to gain knowledge and to contribute to the existing body of knowledge. And to do that, you have to have valid measures as well as you have to be able to repeat your analysis and repeat your findings in subsequent settings or subsequent samples. It is a typical quality control problem. So whereas in the world, we have variation, we are trying to reduce variation and associations. It is my hope that after watching this video, you will be able to enhance the creation of knowledge in your projects.
Let us look at common characteristics of our data that will reduce our ability to contribute to knowledge and specifically that will diminish statistical power. To enhance statistical power, one thing that is essential is to have a large sample size. If you have a small sample size, then that increases variation as you see in the red line. And if you have a large sample, you hope to decrease variation and enhance effect sizes and of course statistical power but also your confidence in your findings. So regarding sample size and its effect on statistical power, probably the largest corporate in reducing sample size in the most problematic issue regarding your data is having missing data. Here is a dataset and it appears as though someone simply deleted these values. This can be a problem for each column of data but also for records, list wise. In other words, there are certain analyses that require all variables to have a value. For example, factor analysis or regression analysis. So what do you do about missing data?