I love simple examples that make an important point. Anscombe’s quartet is a really great way to make the point that data visualization is critical to understand any statistical pattern.
For anyone who hasn’t seen it before: what do these four graphs have in common?
They have the same basic statistical properties:
|Mean of x in each case||9 (exact)|
|Sample variance of x in each case||11 (exact)|
|Mean of y in each case||7.50 (to 2 decimal places)|
|Sample variance of y in each case||4.122 or 4.127 (to 3 decimal places)|
|Correlation between x and y in each case||0.816 (to 3 decimal places)|
|Linear regression line in each case||y = 3.00 + 0.500x (to 2 and 3 decimal places, respectively)|