Share these talks and lectures with your colleagues
Invite colleaguesWe noted you are experiencing viewing problems
-
Check with your IT department that JWPlatform, JWPlayer and Amazon AWS & CloudFront are not being blocked by your network. The relevant domains are *.jwplatform.com, *.jwpsrv.com, *.jwpcdn.com, jwpltx.com, jwpsrv.a.ssl.fastly.net, *.amazonaws.com and *.cloudfront.net. The relevant ports are 80 and 443.
-
Check the following talk links to see which ones work correctly:
Auto Mode
HTTP Progressive Download Send us your results from the above test links at access@hstalks.com and we will contact you with further advice on troubleshooting your viewing problems. -
No luck yet? More tips for troubleshooting viewing issues
-
Contact HST Support access@hstalks.com
-
Please review our troubleshooting guide for tips and advice on resolving your viewing problems.
-
For additional help, please don't hesitate to contact HST support access@hstalks.com
We hope you have enjoyed this limited-length demo
This is a limited length demo talk; you may
login or
review methods of
obtaining more access.
Printable Handouts
Navigable Slide Index
This material is restricted to subscribers.
Topics Covered
- Big data
- Regression analysis
- Understanding data
- Distributional issues
- Independent variables
- Coefficient signs
- Significance
- Overfitting
- Regression with big data
Links
Series:
Categories:
Talk Citation
DeGennaro, R.P. (2017, April 30). Regression with big data [Video file]. In The Business & Management Collection, Henry Stewart Talks. Retrieved January 7, 2025, from https://doi.org/10.69645/HCAS7540.Export Citation (RIS)
Publication History
Other Talks in the Series: Business Intelligence, Big Data, and Applications in Industry
Transcript
Please wait while the transcript is being prepared...
0:00
Hello, I'm Ramon DeGennaro,
the Haslam College Business Professor
of Banking and Finance
at the University of Tennessee.
Today, I'll be telling you about
Regression with Big Data.
0:14
Analysis using huge datasets
including regression analysis
is much the same as it is
with other datasets.
Perhaps the biggest difference
is that Big Data
offers researchers much more rope,
the better to hang themselves.
Modern software and computers
make it easy
to get the research process backwards.
Proper analysis begins with a question.
The researcher decides
which approach might answer it,
then can collect the necessary data.
Only then does he or she
estimate a model.
Today, technology and Big Data
tempt researchers
to turn to empirical analysis too soon.
We have a choice.
We can choose the difficult task
of thinking about the problem
with almost no immediate gratification
or we can plunge
into the empirical analysis immediately
because performing analysis
using even sophisticated
empirical methods
requires only a little bit of coding.
Immediate gratification with little
or no thought
is a handy winner in most cases.
Presto, we have a result.
Unfortunately, the result probably won't
provide the answer to our question.
Big Data tips the scales even
further toward mindless computations.
Few people enjoy
sorting through hundreds
or even thousands of variables
to decide which ones will be useful.
It's much easier to resort
to automated modeling techniques
such as the venerable stepwise
regression.
This is particularly important
with hundreds
or even thousands of variables.
We can and must do better.
Another problem with Big Data
is that huge numbers of observations
mean tiny standard errors,
so our coefficient estimates
are extremely precise.
It's not clear what that means.
A precise measurement
of the wrong variable
or using bad data
or a bad model is no help,
and in fact, leads to overconfidence
and policy errors.