Hello, my name is Gary Templeton.
I'm an Associate Professor
in Business Information Systems
at Mississippi State University.
This discussion is about
Structuring Big Data.
It is about downloading archival data
or whatever type of data you have
and looking into the data
to see its characteristics,
to see if it is going to be a hindrance
in your analysis.
It's important to note that
the goal of any research project
is to gain knowledge
and to contribute to the existing
body of knowledge.
And to do that, you have to have
as well as you have to be able
to repeat your analysis
and repeat your findings
in subsequent settings or subsequent samples.
It is a typical quality control problem.
So whereas in the world, we have variation,
we are trying to reduce variation
It is my hope that after watching this video,
you will be able to enhance
the creation of knowledge in your projects.
Let us look at common characteristics
of our data
that will reduce our ability
to contribute to knowledge
and specifically that will diminish
To enhance statistical power,
one thing that is essential
is to have a large sample size.
If you have a small sample size,
then that increases variation
as you see in the red line.
And if you have a large sample,
you hope to decrease variation
and enhance effect sizes
and of course statistical power
but also your confidence in your findings.
So regarding sample size
and its effect on statistical power,
probably the largest corporate
in reducing sample size
in the most problematic issue
regarding your data is having missing data.
Here is a dataset and it appears as though
someone simply deleted these values.
This can be a problem for each column of data
but also for records, list wise.
In other words, there are certain analyses
that require all variables to have a value.
For example, factor analysis
or regression analysis.
So what do you do
about missing data?