The following topic shows you how to check the residuals to see whether your data set meets the three conditions of the multiple regression models.
Meeting the first condition: Normal distribution with mean zero the first condition to meet is that the residuals must have a normal distribution with mean zero. The upper-left plot of shows how well the residuals match a normal distribution. Residuals falling in a straight line mean the normality condition is met. By the looks of this plot, I'd say that condition is met for the ad and sales example. As an alternative check for normality apart from using the regular residuals, you can look at the standardized residuals plot and check out the upper-right plot. It shows how the residuals are distributed across the various estimated (fitted) values of y. Standardized residuals are supposed to follow a standard normal distribution — that is, they should have mean of zero and standard deviation of one.
So when you look at the standardized residuals, they should be centered on zero in a way that has no predictable pattern, with the same amount of variability around the horizontal line that crosses at zero as you move from left to right. The lower-left plots of show histograms of the regular and standardized residuals, respectively. These histograms should reflect a normal distribution; the shape of the histograms should be approximately symmetric and look like a bell-shaped curve. If the data set is small, the histogram may not be as close to normal as you would like; in that case, consider it part of the body of evidence that all four residual plots show you.
Satisfying the second condition: Variance The second condition in checking the multiple regression models is that the residuals have the same variance for each fitted (predicted) value of y. You shouldn't see any change in the amount of spread (variability) in the residuals around that horizontal line as you move from left to right. One particular problem that raises a red flag with the second condition is if the residuals fan out, or increase in spread, as you move from left to right on the upper-right plot. This fanning out means that the variability increases more and more for higher and higher predicted values of y, so the condition of equal variability around the fitted line isn't met, and the regression model wouldn't fit well in that case.
Checking the third condition. The third condition is that the residuals are independent; in other words, they don't affect each other Looking at the lower-right plot, you can see the residuals plotted by observation number, which is the order in which the data came in the sample. If you see a pattern, you have trouble; for example, if you were to connect the dots, so to speak, you might see a pattern of a straight line, a curve, or any kind of predictable up or down trend.
Meeting the first condition: Normal distribution with mean zero the first condition to meet is that the residuals must have a normal distribution with mean zero. The upper-left plot of shows how well the residuals match a normal distribution. Residuals falling in a straight line mean the normality condition is met. By the looks of this plot, I'd say that condition is met for the ad and sales example. As an alternative check for normality apart from using the regular residuals, you can look at the standardized residuals plot and check out the upper-right plot. It shows how the residuals are distributed across the various estimated (fitted) values of y. Standardized residuals are supposed to follow a standard normal distribution — that is, they should have mean of zero and standard deviation of one.
So when you look at the standardized residuals, they should be centered on zero in a way that has no predictable pattern, with the same amount of variability around the horizontal line that crosses at zero as you move from left to right. The lower-left plots of show histograms of the regular and standardized residuals, respectively. These histograms should reflect a normal distribution; the shape of the histograms should be approximately symmetric and look like a bell-shaped curve. If the data set is small, the histogram may not be as close to normal as you would like; in that case, consider it part of the body of evidence that all four residual plots show you.
Satisfying the second condition: Variance The second condition in checking the multiple regression models is that the residuals have the same variance for each fitted (predicted) value of y. You shouldn't see any change in the amount of spread (variability) in the residuals around that horizontal line as you move from left to right. One particular problem that raises a red flag with the second condition is if the residuals fan out, or increase in spread, as you move from left to right on the upper-right plot. This fanning out means that the variability increases more and more for higher and higher predicted values of y, so the condition of equal variability around the fitted line isn't met, and the regression model wouldn't fit well in that case.
Checking the third condition. The third condition is that the residuals are independent; in other words, they don't affect each other Looking at the lower-right plot, you can see the residuals plotted by observation number, which is the order in which the data came in the sample. If you see a pattern, you have trouble; for example, if you were to connect the dots, so to speak, you might see a pattern of a straight line, a curve, or any kind of predictable up or down trend.
0 comments:
Post a Comment