Fundamentals of data analysis

Published on May 29, 2017   25 min
Hello, my name is Dr. Brian Blank and I'm a finance faculty member at Mississippi State University. Today I will be discussing the fundamentals of data analysis. Specifically, we will focus on how to use large datasets once the data have been gathered and cleaned.
As of late, one of the topics frequently covered by the media and researches alike is the use of big data, because data are becoming increasingly prevalent, being able to analyze large datasets is more important than ever. Datasets are simply a collection of observations or individual bits of information coded together in a uniform manner. But using a large dataset will allow us to learn more about reality. Thousands of observations can be used to understand what is taking place in the outside world. This is simply due to large quantities of computational power that are now available with modern technology. As data become more common and larger than before, datasets will increasingly be used more frequently and data analysis will be more informative than it has been before. With more information, we can learn using more data and this will add additional value to professionals in various industries.
Before we begin, I want to talk about some of the ideas that we will be discussing today. We'll first discuss ways to summarize our data. Summary statistics, which are also sometime referred to as descriptive statistics, can be used to help us understand the data that we'll be analyzing. The statistics describe characteristics of the data, as well as providing an idea what observations are or are not included in our dataset. Next, we'll spend time analyzing data and summarizing the relation between variables. In particular, univariate analysis is simply analysis using a single variable and how it relates to another, as a result, we'll begin with reviewing how two variables are correlated, which tells us how changes in one relate to changes in another variable. Then we will incorporate additional variables and perform multivariate analysis. On the other hand, multivariate analysis allows us to incorporate information that could be related to our variable of interest and provides additional confidence in our results by controlling for the effects of these other variables. For the purposes of our discussion today, we will focus on financial information, that is information about a firm that comes from financial statements. However, most of the principles are also easily applicable to other forms of data. An example of multivariate analysis using financial information could be a situation where we want to know if it is more profitable for a firm to hold higher levels of debt in order to finance to firm's assets. However, we may want to consider the size of the firm, even if it is not our main variable of interest. Size is important because large and small firms may have other differences between them, they could also influence the firm's profitability and debt holdings. We'll discuss these ideas more later on in our analysis.