So last time we were working on finding a good way to say “the darts I throw went more or less HERE”.

Points | Distance | $(x_i-\bar{x})^2$ | $(y_i-\bar{y})^2$ |
---|---|---|---|

(-2.2,4.1) | 4.6 | 7.8 | 13.7 |

(-1.1,2.1) | 2.4 | 2.9 | 2.9 |

(-0.5,0.7) | 1.1 | 1.2 | 0.1 |

(-2.5,-3.1) | 4.7 | 9.6 | 12.3 |

(-0.3,-3.5) | 4 | 0.8 | 15.2 |

(2.2,3.8) | 3.8 | 2.6 | 11.6 |

(2.4,1.8) | 2.3 | 3.2 | 2 |

(3.8,-0.4) | 3.3 | 10.2 | 0.6 |

(3.2,-1.8) | 3.4 | 6.8 | 4.8 |

Avg=(0.6,0.4) | Max=4.7 | Avg=5 | Avg=7 |

$\sqrt{Avg}=2.2$ | $\sqrt{Avg}=2.6$ |

And in the end we came up with a formula for “standard deviation”:$\sigma=\sqrt{\frac{1}{n}\sum(x_i-\bar{x})}$

It looks like the box is at least moderately representative of where the darts went, but not many of the darts actually landed in the box. Why not? We came up with an oddball average distance away from center in the x- and y-directions, and based our rectangle on that, so why does it contain so few of the points? The picture above only contains two of the nine darts, so how can we justify calling it an ‘average’ center of any sort?

Consider for a moment what this would look like if all of the darts had a zero y-value (orange) or a zero x-value (green):

Looking at the orange dots, it looks like there are a bunch to the left and a bunch to the right and not many in the middle, so if our box is in the middle it’s probably OK if it only has a few darts in it. For the greens, five of the nine are in the box so I can’t complain too much about that. So where is the problem? The box we have drawn contains only darts that fit into BOTH categories!

Revisiting basic probability for a moment, recall how we would calculate the probability of rolling a ‘3’ on a normal die while also flipping a coin and getting ‘heads’: the probability of rolling a ‘3’ is $\frac{1}{6}$ and the probability of flipping for ‘heads’ is $\frac{1}{2}$ so the probability of both at the same time is $\frac{1}{6} \times \frac{1}{2} = \frac{1}{12}$. We have to multiply the fractions to find the probability of two independent events happening at the same time.

If we assume that our left-right error has nothing to do with the up-down error in our dart-throwing skills, then we can treat those two axes as independent and do a little multiplication to find out what is likely to be in the box. Just to see what happens, let’s redo the experiment with 100 points and keep an eye on how many points appear in the box when reduced to each axis, and also how many un-moved points stay in the box. We’ll call this our probability of being ‘in the box’. I’ll spare you the listing of 100 points, and finding the averages and standard deviations: we did that in the previous article about darts.

Wow. That graph is a bit of a mess. Even making the blue and red points semi-transparent didn’t help much. If you could count the individual dots, you would find that there are 72 blue dots in the box, 68 red dots in the box, and 50 orange ones. Coincidentally, $\frac{72}{100} \times \frac{68}{100} = 0.4896 \approx \frac{50}{100}$, which is about what we would expect since measured probability is not guaranteed to be exactly the same as a prediction.

Arguably, a dart at the far corner of the box is not as ‘good’ a shot as one in the middle of an edge, if you consider the distance from center. Perhaps something useful would be to consider a circle, but if we put a circle inside a rectangle, we lose more darts (we’re already down to about half, not very good if we want to refer to ‘most’ of them) and also circles don’t fit very well inside rectangles. It seems to me that the obvious solution is then to use an ellipse and put it outside, killing both birds with one equation. The math is going to get a little rough here, since finding out whether or not a point is inside an ellipse is a little harder than for a rectangle, and finding the equation of the right ellipse is a little tougher as well. Let’s start with a simple example and work our way up to the darts.

In this graph, there is a rectangle extending 3 units left and right from center, and 2 units up and down. The ellipse fits into the same space, and clearly is smaller than the rectangle. How are we going to figure out how big the ellipse ought to be in order to be outside the rectangle instead of inside? Let’s see if there is a value $k$ by which we can scale both the width $w$ and height $h$ of our ellipse in order to include the point $(w,h)$.

Now that we have an equation with the proper scaling factor in it, let’s replace $(x,y)$ with the specific point $(h,w)$ which, coincidentally, has the same values as the height and width (measured from center, not all the way across) of our ellipse.

Well, perhaps that wasn’t too bad. Starting with the equation of an ellipse of width $w$ and height $h$, substitute $w$ in for $x$ and set the result to $h$, and also make sure to use $hk$ and $wk$ for the height and width of the ellipse we want to find. After than, cancel common factors, square everything to get rid of the square root, and solve for $k^2$. Finally, take the square root to get our answer in terms of $k$. Whoever would have guessed that all we have to do is multiply $w$ and $h$ by a simple constant? Then the final equation, which circumscribes the rectangle extending $w$ units left and right and $h$ units up and down, looks like this:

And the resulting graph looks like this:

Beautiful! Now how many of the points is this ellipse going to contain?

There are 32 orange dots outside the blue ellipse, so there are 68 inside. I just eyeballed that from the graph, so if that’s off by one don’t be too surprised. $100-32 = 68$, so 68% of the darts are within the ellipse. That’s in line with the 72 blue and 68 red dots above! So perhaps we are at the point where we can say, “Most of my darts are inside that ellipse!”

But…

What happens if our high darts tend to be to the right, and low ones to the left?

That is a story for next time. Until then, keep throwing those darts!