I love simple examples that make an important point. Anscombe’s quartet is a really great way to make the point that data visualization is critical to understand any statistical pattern.

For anyone who hasn’t seen it before: what do these four graphs have in common?

They have the same basic statistical properties:

Property | Value |
---|---|

Mean of x in each case |
9 (exact) |

Sample variance of x in each case |
11 (exact) |

Mean of y in each case |
7.50 (to 2 decimal places) |

Sample variance of y in each case |
4.122 or 4.127 (to 3 decimal places) |

Correlation between x and y in each case |
0.816 (to 3 decimal places) |

Linear regression line in each case | y = 3.00 + 0.500x (to 2 and 3 decimal places, respectively) |