Hello, I'm Ramon DeGennaro,
the Haslam College Business Professor
of Banking and Finance
at the University of Tennessee.
Today, I'll be telling you about
Regression with Big Data.
Analysis using huge datasets
including regression analysis
is much the same as it is
with other datasets.
Perhaps the biggest difference
is that Big Data
offers researchers much more rope,
the better to hang themselves.
Modern software and computers
make it easy
to get the research process backwards.
Proper analysis begins with a question.
The researcher decides
which approach might answer it,
then can collect the necessary data.
Only then does he or she
estimate a model.
Today, technology and Big Data
to turn to empirical analysis too soon.
We have a choice.
We can choose the difficult task
of thinking about the problem
with almost no immediate gratification
or we can plunge
into the empirical analysis immediately
because performing analysis
using even sophisticated
requires only a little bit of coding.
Immediate gratification with little
or no thought
is a handy winner in most cases.
Presto, we have a result.
Unfortunately, the result probably won't
provide the answer to our question.
Big Data tips the scales even
further toward mindless computations.
Few people enjoy
sorting through hundreds
or even thousands of variables
to decide which ones will be useful.
It's much easier to resort
to automated modeling techniques
such as the venerable stepwise
This is particularly important
or even thousands of variables.
We can and must do better.
Another problem with Big Data
is that huge numbers of observations
mean tiny standard errors,
so our coefficient estimates
are extremely precise.
It's not clear what that means.
A precise measurement
of the wrong variable
or using bad data
or a bad model is no help,
and in fact, leads to overconfidence
and policy errors.