Share these talks and lectures with your colleagues
Invite colleaguesWe noted you are experiencing viewing problems
-
Check with your IT department that JWPlatform, JWPlayer and Amazon AWS & CloudFront are not being blocked by your network. The relevant domains are *.jwplatform.com, *.jwpsrv.com, *.jwpcdn.com, jwpltx.com, jwpsrv.a.ssl.fastly.net, *.amazonaws.com and *.cloudfront.net. The relevant ports are 80 and 443.
-
Check the following talk links to see which ones work correctly:
Auto Mode
HTTP Progressive Download Send us your results from the above test links at access@hstalks.com and we will contact you with further advice on troubleshooting your viewing problems. -
No luck yet? More tips for troubleshooting viewing issues
-
Contact HST Support access@hstalks.com
-
Please review our troubleshooting guide for tips and advice on resolving your viewing problems.
-
For additional help, please don't hesitate to contact HST support access@hstalks.com
We hope you have enjoyed this limited-length demo
This is a limited length demo talk; you may
login or
review methods of
obtaining more access.
Printable Handouts
Navigable Slide Index
- Introduction
- The goal of any research project
- Common data problems (1)
- Missing data in a real dataset
- Common data problems (2)
- Inflated frequencies in a real dataset
- Common data problems (3)
- Non-normality
- Distribution options
- Distribution option: Natural log
- Distribution option: Winsorized
- Distribution option: Two-step
- Distribution option: Random-normal
- Distribution option: Truncated
- Distribution option: Power (Box-Cox)
- Distribution option: Ranking (uniform)
- Transformation guidance
- Research model
- Content validity of measures
- Contact details
This material is restricted to subscribers.
Topics Covered
- The goal of research
- Common data problems
- Missing data
- Inflated frequencies
- Non-normality
- Common flaws in ameliorating non-normality
- Distributional options
- Data reduction
Links
Series:
Categories:
Talk Citation
Templeton, G. (2017, April 30). Structuring big data [Video file]. In The Business & Management Collection, Henry Stewart Talks. Retrieved November 18, 2024, from https://doi.org/10.69645/MWOZ4696.Export Citation (RIS)
Publication History
Other Talks in the Series: Business Intelligence, Big Data, and Applications in Industry
Transcript
Please wait while the transcript is being prepared...
0:00
Hello, my name is Gary Templeton.
I'm an Associate Professor
in Business Information Systems
at Mississippi State University.
This discussion is about
Structuring Big Data.
It is about downloading archival data
or whatever type of data you have
and looking into the data
to see its characteristics,
to see if it is going to be a hindrance
in your analysis.
0:25
It's important to note that
the goal of any research project
is to gain knowledge
and to contribute to the existing
body of knowledge.
And to do that, you have to have
valid measures
as well as you have to be able
to repeat your analysis
and repeat your findings
in subsequent settings or subsequent samples.
It is a typical quality control problem.
So whereas in the world, we have variation,
we are trying to reduce variation
and associations.
It is my hope that after watching this video,
you will be able to enhance
the creation of knowledge in your projects.
1:09
Let us look at common characteristics
of our data
that will reduce our ability
to contribute to knowledge
and specifically that will diminish
statistical power.
To enhance statistical power,
one thing that is essential
is to have a large sample size.
If you have a small sample size,
then that increases variation
as you see in the red line.
And if you have a large sample,
you hope to decrease variation
and enhance effect sizes
and of course statistical power
but also your confidence in your findings.
So regarding sample size
and its effect on statistical power,
probably the largest corporate
in reducing sample size
in the most problematic issue
regarding your data is having missing data.
Here is a dataset and it appears as though
someone simply deleted these values.
This can be a problem for each column of data
but also for records, list wise.
In other words, there are certain analyses
that require all variables to have a value.
For example, factor analysis
or regression analysis.
So what do you do
about missing data?