Outliers in Regression
All the points in a simple regression ought to contribute roughly the same to the overall results. If removal of a single point dramatically changes the results, then we would suspect that it is an outlier that does not belong with the other observations in the regression. The applet below depicts a simple regression relating a student's height with his or her weight. Click on a point to delete it from the analysis. Both the original and the new regression equations are displayed and each is graphed (the original in blue and the new one with the point deleted in green). Clicking on other points restores the earlier deleted point and deletes the new point. Clicking on a deleted point restores it.
Example with an Outlier
The graph below depicts the relationship between a student's high school rank and his or her quantitative SAT scores for a group of engineering students. Unfortunately, the person doing the data entry reserved the two value for one observation; that is, a high school rank was entered as an SAT score and vice-versa. Click on the unusual observation to delete it and observe its impact on the overall results. Click on other points to see if any others have as a dramatic an impact. The graph below has the optional outlier indices displayed. The lever identifies observations with unusual influence, RStudent is a t-statistic testing whether the observation is different from the others, and Cook's D assesses the combined effect on the overall results.
Unusual X, but not Unusual Y
An observation with an extreme value on the predictor variable but whose criterion value falls close to the regression line determined by the other points will not have much impact on the regression equation itself, but may have a large impact on the test statistics and their probabilities. Test this by deleting the unusual observation in the graph below and observing the effects on t and p.
Unusual Y, but not Unusual X
An observation with a value of the criterion variable Y that is unusual relative to a model of the other observations, but whose vaulue on the predictor variable X is not unusual, will not influence the regression line much. However, it will make a large contribution to the error term, making it difficult for the statistical results to be significant.
Not Quite Unusual X and Not Quite Unusual Y, but Big Impact
An outlier observation might have a value of the predictor variable X that is not quite detected as an extreme lever and a value of the criterion variable Y that is not quite detected as unusual by RStudent. Nevertheless, that combination may still have a large impact on the model.
