Suppose for a moment that you throw darts at a dartboard occasionally, and want to know ahead of time the most likely place for your darts to go when you are aiming at a particular spot. Assuming you are not a professional, this could be a fairly large patch. With that in mind, let us now take a journey through one of my other self-refocussing exercises that turned out to be useful.
Where is the center of the three darts thrown? That’s pretty easy: take the average of the points, which means take the average of the x-values, and of the y-values, and plot that as another point:
More darts just means more points on the graph, but doesn’t really change any of the math, so we’ll go on. If we considered the average of the points as the center of a circle with the radius reaching to the farthest point, what would that circle look like?
Great! Now we know that of the darts thrown, all of them landed in that circle. True, yes, useful…not so much. What if had kept track of more darts and wanted to know where “most” of them went? That question is just a little bit more interesting.
Let’s start with a bunch of (made up, but work with me for a bit here) thrown darts in orange, and the average of them marked in blue:
Points |
---|
(-2.2,4.1) |
(-1.1,2.1) |
(-0.5,0.7) |
(-2.5,-3.1) |
(-0.3,-3.5) |
(2.2,3.8) |
(2.4,1.8) |
(3.8,-0.4) |
(3.2,-1.8) |
Average=(0.6,0.4) |
The question now is how far away is the farthest point? Sadly, the easiest way to find it is to calculate the distance for each point from center.
Points | Distance |
---|---|
(-2.2,4.1) | 4.7 |
(-1.1,2.1) | 2.4 |
(-0.5,0.7) | 1.1 |
(-2.5,-3.1) | 4.7 |
(-0.3,-3.5) | 4 |
(2.2,3.8) | 3.8 |
(2.4,1.8) | 2.3 |
(3.8,-0.4) | 3.3 |
(3.2,-1.8) | 3.4 |
Average=(0.6,0.4) | Maximum=4.7 |
Since I’m not really sure what to do next, let’s repeat ourselves for a moment and consider taking another average. Let’s add up the distances for each of the $x$ and $y$ values from our center point and find the averages of those, and that will at least tell us whether my aim is better up-and-down or left-to-right. Since I’m not really sure what to call this average distance, I’ll just pick a random letter $\sigma$ and use that. Since we have two distances, one left-right and the other up-down, let’s subscript the $\sigma$ as $\sigma_x, \sigma_y$ so we know which one is which.
Points | Distance | $x_i-\bar{x}$ | $y_i-\bar{y}$ |
---|---|---|---|
(-2.2,4.1) | 4.6 | -2.8 | 3.7 |
(-1.1,2.1) | 2.4 | -1.7 | 1.7 |
(-0.5,0.7) | 1.1 | -1.1 | 0.3 |
(-2.5,-3.1) | 4.7 | -3.1 | -3.5 |
(-0.3,-3.5) | 4 | -0.9 | -3.9 |
(2.2,3.8) | 3.8 | 1.6 | 3.4 |
(2.4,1.8) | 2.3 | 1.8 | 1.4 |
(3.8,-0.4) | 3.3 | 3.2 | -0.8 |
(3.2,-1.8) | 3.4 | 2.6 | -2.2 |
Avg=(0.6,0.4) | Max=4.7 | Avg=0 | Avg=0 |
Oops. What happened? Some of our $x_i-\bar{x}$ were positive, and some negative, so in the end they cancelled out. So how do we ensure that all of the differences are positive? The easiest way I can think of is to ignore the sign, but I don’t really know how to do sums when I have to treat some of the terms differently. The next best choice may be to square everything, which we know makes numbers positive.
Points | Distance | $(x_i-\bar{x})^2$ | $(y_i-\bar{y})^2$ |
---|---|---|---|
(-2.2,4.1) | 4.6 | 7.8 | 13.7 |
(-1.1,2.1) | 2.4 | 2.9 | 2.9 |
(-0.5,0.7) | 1.1 | 1.2 | 0.1 |
(-2.5,-3.1) | 4.7 | 9.6 | 12.3 |
(-0.3,-3.5) | 4 | 0.8 | 15.2 |
(2.2,3.8) | 3.8 | 2.6 | 11.6 |
(2.4,1.8) | 2.3 | 3.2 | 2 |
(3.8,-0.4) | 3.3 | 10.2 | 0.6 |
(3.2,-1.8) | 3.4 | 6.8 | 4.8 |
Avg=(0.6,0.4) | Max=4.7 | Avg=5 | Avg=7 |
I’m not sure I want to make a box that extends 5 units left and right, and 7 units up and down (the blue box in the graph below). It seems way too big, well outside the darts farthest from center, and doesn’t look at all like an average of the darts. Perhaps I should follow the advice of my science teacher: “When something looks really wrong at the end, follow the units.” Let’s call everything inches even though the graph doesn’t have anything on it (my dartboard doesn’t either). In $(x_i-\bar{x})^2$ inches minus inches is still inches, and inches times inches gives inches squared. Wait! the $n$ in $\frac{1}{n}$ doesn’t have any units – it is just the number of darts thrown, so dividing by $n$ doesn’t change the square inches. So really, what I want to do is take the square roots of 5 and 7 to get the actual size (in inches from center, not square inches) of the box to draw, colored green in the graph below.
Points | Distance | $(x_i-\bar{x})^2$ | $(y_i-\bar{y})^2$ |
---|---|---|---|
(-2.2,4.1) | 4.6 | 7.8 | 13.7 |
(-1.1,2.1) | 2.4 | 2.9 | 2.9 |
(-0.5,0.7) | 1.1 | 1.2 | 0.1 |
(-2.5,-3.1) | 4.7 | 9.6 | 12.3 |
(-0.3,-3.5) | 4 | 0.8 | 15.2 |
(2.2,3.8) | 3.8 | 2.6 | 11.6 |
(2.4,1.8) | 2.3 | 3.2 | 2 |
(3.8,-0.4) | 3.3 | 10.2 | 0.6 |
(3.2,-1.8) | 3.4 | 6.8 | 4.8 |
Avg=(0.6,0.4) | Max=4.7 | Avg=5 | Avg=7 |
$\sqrt{Avg}=2.2$ | $\sqrt{Avg}=2.6$ |
I’m still not really happy about those boxes, though. The green one is too small, and the blue one too big. On the bright side, we did ‘accidentally’ come up with the formula for standard deviation:$\sigma=\sqrt{\frac{1}{n}\sum(x_i-\bar{x})^2}$
The biggest problem that I see is that the green box clearly does not contain enough of the darts to be useful. We’ll figure out why next time, and see if we can work our way towards a solution.