Linear Regression Model

Linear Regression Model

In this post, we will dig and explore the basics of linear regression models as their essentials to predictive analysis. We will first analyze or examine the scatter plot and correlations that involves two quantitative variables. In addition, we will also examine the simple and multilinear regression models, and discuss predictions.

What you will learn

  1. Understanding the scatter plot and correlation.
  2. Basic understanding of the simple linear regression models.
  3. You should be able to use build multiple linear regression models.
  4. Know how predictions work with the use of regression models.

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model (yale.edu, para 1).

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0) (yale.edu, para 3).

Y_i=f(X_i, \beta)+e_i
Y_i=dependent variable
f=function
X_i=independent variable
\beta=unknown parameters
e_i=error terms

This video will help you understand Error terms

Examining Scatter Plot and Correlation

This scatters plot with a fitted linear regression line shows the relationship between Accuracy (Y)and Counts(X) (two quantitative variables) that has a positive relationship.

Determining correlation coefficient

In some cases we need to evaluate more than two variables, we can examine a scatter plot matrix like the example one below, I used rdrr.io website for Air Quality: New York Air Quality Measurements.

require(graphics)
 pairs(airquality, panel = panel.smooth, main = "airquality data")

While the focus in this is the linear relation through regress models, the relationship will not always be linear like the one in the scatter plot matrix. For example, the variables Wind and Day show a curvilinear relationship.

Source https://mathbitsnotebook.com/Algebra1/StatisticsReg/ST2CorrelationCoefficients.html

The correlation coefficient and denoted by r measures the linear association between the two variables. A perfectly linear relationship will always between -1 and 1. if no relationship If r is equal to zero(0).

So as a conclusion to this part 1, when you are dealing with scatter plots we only interested in the direction of positive and negative, and the strength of the relationship. In addition, if the two variables form a perfect linear relationship means that all data represent a straight line. As you can see In the example above of Wind and Day, the relationships don’t seem to be that strong. In order to quantify the direction and strength of the relationship, you should bring or introduce the correlation.

References:

Yale.edu(2021). Linear Regression. Retrieved from http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm

Khan Academy (Nov. 15, 2010). The squared error of regression line. Retrieved from https://www.youtube.com/watch?v=6OvhLPS7rj4

See you in part II.

Subscribe
Notify of
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ronie

Nice. Thanks for this blog Harry.

2
0
Would love your thoughts, please comment.x
()
x