Pebble Watch

I recently purchased a Pebble watch. It tells the time really well, which is quite exciting for someone who hasn’t owned a watch for around 8 years.

It does of course do a bunch other really useful stuff. In particular, it talks bluetooth to my mobile phone, which lets it display text messages, emails and incoming call notifications.

These notifications are surprisingly useful, particularly when cycling or driving, as it saves me from fumbling in pockets while otherwise occupied.

The pebble is also highly programmable. It comes with a development SDK which lets you write custom watch faces as well as full blown applications.

Party time

Last night I wrote my first watch face.

The whole process from start to finish was surprisingly painful (once I read the instructions correctly). I’m looking forward to writing some slightly more interesting watch faces and apps in the future.

Australia Votes 2013

With the federal election looming, there’s going to be lots of interesting data to play with, and I’ll be doing a bit of blogging of things that take my interest in the next couple of weeks.

To get started, I’ve been looking at the NSW Senate preferences. Currently the best source of information is this PDF from the AEC. If you’re interested in doing any analysis of this information, then you might prefer to have this in raw form.

I’ve scraped the data from its raw form and more it into a JSON file. Feel free to use this how you will, and let me know if you find any errors in it!

Intro to Python Context Managers

Python context managers are a powerful tool for expressing a common programming pattern in a simple, concise manner. Most commonly they are used to handle opening and closing files, but this is just the beginning of what they can be used for.

Consider the following code snippet, which measures the time it takes to make a list of the first ten thousand square numbers.

We have three steps here; 1) we initialise our timer, 2) we perform some processing and 3) we stop our timer and print the result. We’ll call these three steps enter, process and exit.

The important thing to note here is that the enter and exit steps are codependent. They come as a logical pairing, and it doesn’t make sense to do one without the other.

In this example the enter and exit blocks are physically close to one another, so it is easy to see how they match up, but if we were timing a larger block of code it might not be so obvious. It would be great if we could bundle them together somehow.

Also, we might want to use this same timing pattern in a bunch of places throughout our code. It would be nice if we could put the enter/exit code into something like a function.

The solution is to create a context manager.

In this example timer() is a function which returns a context manager. The details of how it does this are all taken care of by the @contextmanager decorator. The key element to notice is that a yield statement must be placed between the enter and exit code.

To use the context manager we use the with statement, which is followed by an indented block of code containing the process code.

By factoring the enter/exit code into a context manager function we have created a timing device which can be reused throughout our code. Furthermore, the codependent enter and exit code are kept together, which reduces the likelihood of introducing errors when changes need to be made to the logic.

Finally, the process code is identifiable as a single unit which is “wrapped” by the enter/exit code of the context manager, as it is indented as a block below the with statement.

The enter/process/exit pattern is one which shows up in a range of places, and context managers are a neat way to implement this pattern. There are many complex things which can be done with context managers, but these examples show a simple way to get up and running with them to refactor a common, simple pattern.


The examples shown work in Python versions 2.6 and 2.7. Stay tuned for future posts which will go into more details about how to use context managers in more advanced ways.

Strongly Connected Components

The problem of finding the strongly connected components of a directed graph has been solved in an algorithmically optimal manner for over 40 years. In this article we look at an algorithm which is efficient not only algorithmically, but also from a memory usage perspective.

Directed Graphs

One of the fundamental data structures in computer science is the directed graph. It finds applications in everything from quantum mechanics to physical mail delivery.

A directed graph can be thought of as a collection of circles connected by arrows. An arrow tells us that it is possible to get from one circle to another. If we can get from one circle to another circle by following a series of arrows (in the correct direction) then we say that these circles are connected.

In many cases we might be able to get from circle “A” to circle “B” by following arrows, but we might not be able to get from B back to A. In this case we say that A and B are weakly connected. If its possible to get from A to B and also from B back to A then we say that they are strongly connected.

If A is strongly connected to both B and C then we know that B and C are also strongly connected to each other, since we can get between them by going via A.

By grouping together circles which are strongly connected to each other we can break a graph up into strongly connected components (SCCs).

It turns out the problem of breaking up a graph into strongly connected components is one which pops up all the time in areas including disease transmission and ocean current analysis.

In the rest of this article we’ll look at algorithms to solve this problem and how they’re implemented in Scipy to be super fast and efficient.

SCC Algorithms

The canonical SCC algorithm was published by Robert Tarjan in 1972 [PDF]. His paper describes a recursive algorithm which is based on a depth first search. Many implementations of this algorithm can be found in a variety of languages.

Tarjan’s algorithm is algorithmically optimal, since the time taken to compute the SCCs is linear in both the number of circles (nodes) and arrows (edges). One of its drawbacks however is that it is a recursive algorithm. When implemented in most languages recursion is both slow and memory intensive due to the need to maintain a complete call stack.

One way to mitigate this problem is to convert the recursive algorithm into an iterative one. It can be shown that this allows the problem to be solved with a per-node memory usage of 5 words plus 2 bits, or 162 bits per node when working with 4 byte words. This equates to approximately 19.3MB per million nodes. This of course assumes that the algorithm has been implemented with memory optimisation in mind, which is often not the case!

Now 5 words per node sounds pretty good, but if we could shave this down by a word or two we could make significant savings! Is such an improvement possible?

In 2005 David Pearce analysed this exact question and came to the conclusion that with a bit of tweaking, Tarjan’s original algorithm could be modified to only require 3 words per node (or 11.4MB per million nodes) [PDF]. So far this is the most efficient (in terms of memory usage) algorithm known to solve the SCC problem.

Pearce’s paper presented pseudocode for his algorithm in recursive format in order to contrast it with Tarjan’s original algorithm. This recursive algorithm was implemented within Scipy in version 0.11.0 (scipy.sparse.csgraph.connected_components).

This implementation suffered from being recursive and used orders of magnitude more memory than the theoretically possible 3 words per node. In version 0.12.0 of Scipy I implemented the iterative version with optimal memory usage. This version was able to handle graphs around 7,500 times larger than the original before hitting the memory limits of a standard PC.

Lets have a look at how this algorithm works and how we are able to run it with such a low memory overhead.

Pearce’s SCC Algorithm – Details

We start with two counters, index and label. We initialise these to 1 and N respectively, where N is the number of nodes in our graph.

We now pick an arbitrary node as a starting point and perform a depth first search (DFS) traversal of our graph. At each node we assign the current value of index to the node and increment index by one. In this way we assign incrementing numbers to each node which is weakly connected to our starting node. We call the number assigned to each node its link number.

The clever part of the algorithm comes when we have processed all the children of a node and are ready to backtrack up the tree to continue the DFS traversal. When backtracking past a node, we consider all of its children and find the smallest link number of these child nodes. If this is smaller than the link number of the current node, we update the link number of the current node and put the node onto a stack.

If none of the children have a smaller link number then we have found a root node. None of the children of this node point to a node with a smaller link number than the current one, which means that they can’t possibly be strongly connected to any nodes “lower” in the graph.

In this case we assign the value of label to the root node. We also assign label to all the nodes in the stack with a link number greater than or equal to the root’s link number, removing them from the stack at the same time. Once we’ve done all this we decrease label by 1 and decrease index by the number of nodes we just updated.

Once we have completed this backtracking step for all the nodes in our depth first traversal from the original node we will have assigned a label to all the nodes (weakly) connected to the original node. There might still be nodes which are completely disconnected from the original node, so we repeat the process for every node in our graph, skipping nodes which have been assigned a label already.

At the end of all this each node will have a label assigned to it, and all nodes which are strongly connected to each other will have the same label. As such, we’ve solved the problem of determining the SCCs of the graph.

The following tool allows your to step through this algorithm for an example directed graph.



Memory Usage Analysis

We can now look at how much memory we need for this algorithm.

For each node we use one word of memory to store the link number/label.

To perform the DFS we need to keep a stack of nodes. We only need to visit each node once. As such, if we’re trying to push a node which is already on the stack we would like to be able to extract the node from inside the stack and move it to the top of the stack. We can implement this data structure with a doubly linked list in a fixed sized array, which requires 2 words per node. The operation of looking for an element within the stack and moving it to the top can be performed in constant time, maintaining the linear algorithmic complexity of the complete algorithm.

We also need to keep a stack of backtracked nodes, which would usually require another one word per node. As we can observe from the interactive tool above, nodes on the backtrack stack have already been removed from the DFS stack. This means we can share the memory used by the DFS stack and implement the backtrack stack as a singly linked list in a fixed size array, requiring no extra memory.

This gives us a total of 3 words per node, as claimed by Pearce. As such we can claim that the SCC algorithm implemented within Scipy is optimal from a memory usage perspective, whilst maintaining linear computational complexity.

Time, what is time?

So earlier this evening Commander Chris Hadfield, a Canadian astronaut on the ISS, tweeted the question

I gave a quick reply which ended up generating quite a bit of discussion, so I thought I might write up a slightly more detailed answer to the question.

The problem as posed relates to the special theory of relativity. Objects in relative motion will measure time running at a different rate. In this case, the astronauts travelling on the ISS are travelling quite quickly relative to an observer on the ground.

We can measure this time dilation using the Lorentz transformation. We begin by taking the speed of the space station, and use it to calculate a quantity called \gamma.

  \gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}}.

We use the values
 v = 8km/s = 8000m/s\\  c = 300,000,000m/s

and get

  \gamma = \frac{1}{\sqrt{1 - \frac{8000^2}{300000000^2}}}\\   = \frac{1}{\sqrt{0.9999999992888889}}\\   = 1.0000000003555556\\

So this is our time dilation factor, which tells us how much slower the clocks on the space station are running.

Now, if the astronauts have been up there for what we on Earth would measure as 5 months, how much time have the astronauts experienced with their slow running clocks? Well, we can work that out with the equation

t = \frac{t_0}{\gamma}.

Here we let t_0 be 5 months, or 5 \times 30 \times 24 \times 60 \times 60 = 12960000s. So the time experienced by the astronauts will be

  t = \frac{12960000s}{1.0000000003555556}\\   = 12959999.995392s.

This is very close to exactly 5 months, but it’s just a tiny bit smaller. How much smaller is it? Well,

  \Delta t_{SR} = t_0 - t\\   = 12960000s - 12959999.995392s\\   = 0.004608s

This is much less than a single second! In fact, it’s a touch over 4.6 milliseconds.

So there we go, the astronauts will be 4.6 milliseconds younger than they would be if they’d stayed planted on Earth. The time difference is due to time dilation, a direct consequence of their movement relative to Earth and the theory of special relativity.

But this is only part of the picture! My answer to the original question only took into account the relative speeds of the space station and an observer on Earth. If we delve deeper, we find that gravity also has an effect on the rate at which clocks tick. When standing at sea level, we’re around 6371km from the centre of the Earth. But the space station orbits at around 370km above sea level, which puts them 6741km from the centre of the Earth. Because of this difference in distance, the space station experiences a weaker gravity field than we do here on terra-firma.

The effect of gravity on time can be calculated by using the equations from general relativity. Gravity causes time to run more slowly, so we need to work out how strong this effect is both on Earth, and on the space station. We’ll use the following equation to work out the time dilation factor due to gravity.

  \gamma = \frac{1}{\sqrt{1 - \frac{r_0}{R}}}.

In this equation r_0 is the Schwarzschild radius of the Earth, which is based on its gravity. It essentially tells us how far away the event horizon would be if the Earth collapsed into a black hole. It turns out to be quite small, just 9mm, or 0.009 metres.

The variable R is the distance from the centre of the Earth. Based on the numbers above we can work out that for someone standing at sea level we get

  \gamma_E = \frac{1}{\sqrt{1 - \frac{0.009}{6371000}}}\\   = \frac{1}{\sqrt{0.9999999985873489}}\\   = 1.0000000007063257.

For a person on the space station, we get

  \gamma_S = \frac{1}{\sqrt{1 - \frac{0.009}{6741000}}}\\   = \frac{1}{\sqrt{0.9999999986648865}}\\   = 1.000000000667557.

We can now work out how long 5 Earth months would take on the space station, by considering the relative clock speeds on Earth and the space station.

  t = t_0\times\frac{\gamma_E}{\gamma_S}\\    = 12960000s \times \frac{1.0000000007063257}{1.000000000667557}\\    = 12960000.000502443s.

So we see that the effect of general relativity causes the astronauts to experience more time than if they’d stayed on Earth! How much more time?

  \Delta t_{GR} = t_0 - t\\   = 12960000s - 12960000.000502443s\\   = -0.000502443s,

or around 0.5 milliseconds.

So if we put together the effects of both general relatively (gravity) and special relativity (speed differences) we get the total time difference,

  \Delta t_{tot} = \Delta t_{SR} + \Delta t_{GR}\\   = 0.004608 - 0.000502443\\   = 0.004105557s.

So there we have it, the answer to the Commander’s question is 4.1 milliseconds! 5 months in space to travel into the future by less than the blink of an eye. Worth it? Totally!

A quick algorithms problem

Here’s a quick little problem to try. My solution (in python) is below but I’d love to see different ways to solve it.

Problem

Assume you have a list of 8 items numbered 0 through 7, however they are in some random order.  You would like to encode the order they are in using the least number of bits possible.

There are 8! possible options for how they are ordered, which could conceivably be numbered 1..40320.  This could then be encoded as a 16-bit int.

Find an efficient way to determine the correct encoding for a given list, and vice-verse to determine the ordering for a given encoding.

Solution

def decode(x):
    digits = [0,1,2,3,4,5,6,7]
    values = []
    for modval in [5040, 720, 120, 24, 6, 2, 1]:
        index, x = divmod(x, modval)
        digit = digits[index]
        digits.remove(digit)
        values.append(digit)
    values.append(digits[0])
    return values

def encode(values):
    result = 0
    digits = [0,1,2,3,4,5,6,7]
    for mulval, value in zip([5040, 720, 120, 24, 6, 2, 1], values[:-1]):
        index = digits.index(value)
        result += mulval*index
        digits.remove(value)
    return result

for x in range(8*7*6*5*4*3*2*1):
    values = decode(x)
    xx = encode(values)
    assert x == xx

Trigonometry: The study of… circles?

Trigonometry, we are taught, is the study of triangles. In particular, right angled triangles come in for much attention, as they happen to have particularly nice properties. Pythagoras’ theorem relating the lengths of the sides of a right angled triangle could make a robust claim for being the most famous formula in all of mathematics.

Pythagoras' Theorem

Source: Pedro Sanchez

Pythagoras’ theorem is great; it’s easy to remember, it’s easy to compute, and it’s clearly useful. Most people don’t have a problem with this first foray into trigonometry. But the world of trig beyond Pythagoras quickly descends into a mess of sines, cosines, thetas and other confusing terminology. Not surprisingly, students resort to rote learning of formulae and rarely have a concrete understanding of what it is they’re doing.

As students progress, more trig functions are introduced and more identities committed to memory. HSC students across the state can be found before exams, huddled around cheat sheets jam packed with symbols and equations, desperately trying to load as much into short term memory as possible, hoping that when they splurge their “knowledge” onto the exam paper it will come up trumps.

Even as a maths nerd, I’ve always struggled with trig. I think this is largely due to the emphasis on remembering specific formulas and identities (SOHCAHTOA anyone?) without delving into the underlying geometry to understand where these relationships come from and how they’re related.

Today I’m going to give a brief refresher course on how trig is currently taught, so you can relive the confusion of your childhood. Once I’ve broken your souls, I’ll build them back up by presenting all the trig functions as they should have been taught (IMHO, IANA maths teacher, YMMV, etc).

Learning trig the hard way

To introduce students to trig, we start with a right angled triangle. One of the angles is labeled \theta, and the sides are labeled “adjacent”, “opposite” and “hypotenuse”.

We then define three functions,

$latex
\sin(\theta) = opposite / hypotenuse\\
\cos(\theta) = adjacent / hypotenuse\\
\tan(\theta) = opposite / adjacent,
$

and tell students that these functions are important, so they must remember them. In fact, remembering these equations is so important that we give them an acronym that will stick with students for life: SOHCAHTOA. I’m not going to lie, I used this acronym to write the equations out in this post.

In the fullness of time, it gets revealed that the full names for these functions are actually “sine”, “cosine” and “tangent”, although the etymology is not delved into. “Aren’t tangents related to curves?”, a bewildered student might ask. “Yes, but those are different, this tangent is the one about triangles”, the muddled teacher might respond.

And so from the simple right angled triangle we get this trio of function; sin, cos and tan. It’s not really clear why these functions are important, or how they’re related to each other, just that they can be used to “solve triangles”. Once the students have really got the hang of doing that, they get introduced to three new functions:

$latex
\sec(\theta) = \frac{1}{\cos(\theta)}\\
\csc(\theta) = \frac{1}{\sin(\theta)}\\
\cot(\theta) = \frac{1}{\tan(\theta)}
$

These are the (multiplicative) inverses of the normal trig functions (not to be (but often) confused with the inverse trig functions). It’s again not clear why these functions are important. Or why we need new names for these things. Or where the names came from. Or what they’re useful for. But they let us write down many more formulae! And of course, the students get to blindly learn many of these new formulae.

What have we learned here?

So, let’s summarise the take home message of high school trig:

  • Triangles
  • Memorise lots of formulas
  • SOHCAHTOA

Or something like that, the students are really sure. It’s no surprise that for many students this component of the curriculum is a baffling speed bump which needs to be overcome rather than understood.

But they are earnestly assured that trigonometry is important, that they’ll need it at university if they want to study science, economics, or engineering. How can something as demonstrably tedious as the study of triangles be useful in such a broad range of contexts? The answer is quite simple: it’s not.

Sure, triangles show up here and there when doing things like architecture or surveying, but aside from these fields which use lots of straight lines in a plane there’s just not that much of a demand for the services of this simple geometric shape.

It’s all about circles

The reason we see trig show up in so many fields of science and engineering is that trigonometry is not actually the study of triangles, it is the study of circles.

Allow me to repeat that so it really sinks in.

Trigonometry is the study of circles.

And circles show up everywhere! They are the simplest of all the shapes, yet the most powerful. Any system which has repeating motion can be described using circles, which leads us to have an understanding about the physics of everything from a pendulum to light. It tells us how electrical circuits work and how the waves in the ocean behave. In fact, one of the most fundamental ideas in physics, that of quantum mechanics, is based on the idea that “stuff” behaves like an oscillating wave, and can therefore be described using circles.

This is the reason why trigonometry is so important, and shows up in so many places. It is because circles show up everywhere, and trigonometry is the study of circles!

“OK, but hold on, didn’t we get the trig functions by looking at triangles and all that SOHCAHTOA stuff?”.

Sure, we did, and that was bad. It wasn’t wrong, because that’s a perfectly valid way to define the trig functions, but it turns out there’s a better way to define them. One that involves circles. One that fundamentally makes much more sense. One that will let you intuitively understand how these functions behave. One that will let you see the links between the trig functions and understand how they might be applied to problems involving circles (i.e. interesting, real world problems).

Learning trig the right way

So, let’s start with the simplest of all circles; a unit circle centered at the origin, O, of a cartesian coordinate plane. Now let’s draw a line from the origin to a point P on the circle, forming an angle \theta with the x-axis. Now, draw one half of a tangent line from P to the x-axis. Label the intersection with the x-axis T. Finally, drop a perpendicular line from P to the x-axis. Label the intersection the x-axis S. You should end up with a diagram which looks like this.

Now, based on this circle, we can define some functions.

$latex
\sin(\theta) = PS\\
\tan(\theta) = PT\\
\sec(\theta) = OT
$

These are the three fundamental trigonometric functions, and we can see that they all come from the circle. We also get another neat bonus. Notice that \tan(\theta), the tangent function, is actually related to the length of a tangent! Also, \sec(\theta), the secant function, is related to a secant! These functions have these names for a perfectly good reason, but you only get to see this meaning when you define them in terms of a circle. The word “sine” has a complicated origin but can be traced back to a Sanskrit word for “half-chord”. This is exactly what the line PS is, but you need a circle to have a chord.

If we now perform a “complementary” construction, focusing on the y-axis, we get three more functions. Extend the tangent to intercept the y-axis and label the intersection point T^\prime. Drop a perpendicular from P to the y-axis and label the point S^\prime. Your diagram should now look something like this:

We can now define three “complementary” functions,

$latex
\textrm{cosin}(\theta) = PS^\prime\\
\textrm{cotan}(\theta) = PT^\prime\\
\textrm{cosec}(\theta) = OT^\prime,
$

or \cos(\theta), \cot(\theta) and \csc(\theta) as they are more commonly known. These co-functions correspond directly to the fundamental functions in an obvious and simple way. Sine is the perpendicular distance to the x-axis, cosine is the perpendicular distance to the y-axis. Tan is the tangent distance to the x-axis, cotan is the tangent distance to the y-axis. Sec is the distance along the x-axis to the tangent intercept, cosec is the distance along the y-axis to the tangent intercept.

There are certain neat symmetries which become really obvious when we have this picture of the trig functions at our disposal. For example, if \theta = 45^\circ then we have a symmetric picture and we’ll get

$latex
\cos(45^\circ) = \sin(45^\circ)\\
\cot(45^\circ) = \tan(45^\circ)\\
\csc(45^\circ) = \sec(45^\circ).
$

This symmetry about 45^\circ also tells us that we can get a graph of a co-function by flipping its fundamental function around the line \theta = 45^\circ.

In fact, almost all of the interesting properties of the trigonometric functions shine through when we consider circles, while remaining hidden and obscured if we just think about triangles. This is all due to the simple fact that trigonometry is really the study of circles, not triangles.

It’s time for a change

What can we conclude from this change in perspective on trigonometry?

Well, we should not be teaching students about trigonometry using triangles. It is a confusing abstraction with few real world applications. Indeed, there are better systems for solving geometric problems involving triangles, which do not rely at all on the trigonometric functions.

Trigonometry in the real world is all about circles, and we do a disservice to students by pretending that it’s all about triangles. By teaching trigonometry based on circles we can provide students with a deeper understanding of the fundamental nature of the functions involved, while preparing them to use these functions effectively in their future studies.

Competitive Higher/Lower – Part 1

At my local pub’s weekly trivia night, one of the rounds involves a game of competitive higher/lower. The quiz master has a particular number in mind and two players take turns to guess the number, being told higher or lower until someone guesses correctly and wins.

Most players tend to pick a number somewhere in the middle of the available range, leading to a reasonably quick resolution of the game one way or another. Last week my friend employed a different tactic, always opting to pick the smallest number in the available range, while the other player picked numbers somewhere in the middle of the range. My friend lost.

This week I got to play and employed the same tactic, as we’d convinced ourselves through some vague handwaving that it was the optimal tactic. I lost.

This tactic makes the game take twice as long as it normally would, and since only two people in the pub get to play the game it is generally met with groans of exasperation from the unamused room of punters.

Two losses from two attempts (and a good dose of mocking from the quiz master) didn’t seem like a good advertisement for the tactic, but it was also a very small sample size. I wanted to know if our tactic was any good, and if so, how good. So I did some maths which I’d like to share with you here.

The game at the pub typically involves guessing, to the exact cent, the value of a bag of goodies. An initial range is given, say between 3500 cents ($35) and 3600 cents ($36). To make things easier to talk about, let’s define a range [a, b] as all the integers between a and b (inclusive). Also, let’s call the spread of a range the cardinality of the range, or the number of numbers in the range. So the spread of [a, b] would be b - a + 1.

With these definitions in mind, let’s consider some basic situations in the game. If the available range has a spread of 1, then you have one choice – the correct answer – and (unless you make a terribly embarrassing mistake) you have a 100% chance of winning. If there is a spread of 2, you have a 50/50 chance of picking the right answer, and if you get it wrong the other person is left with a spread of 1, meaning they’ll win. So, a spread of 2 gives a 50% chance of winning.

Now, if there’s a spread of 3 then things get interesting. If you pick the middle number, you have a 1/3 chance of winning. But you have a 2/3 chance of losing, and if you lose your opponent faces a spread of 1 either way, meaning they win. So picking the middle option in a spread of 3 gives you a 33.3% chance of winning.

If, on the other hand, you pick a number not in the middle, you still have a 1/3 chance of winning on that move, but if you get it wrong then your opponent is faced with a spread of 2, meaning they only have a 50% chance of winning. So your total odds are (1/3) + (2/3)*(1/2) = 2/3. Picking the edge number gives you a 66.6% chance of winning!

We see that in this very simple case, picking an edge number is clearly a better option than picking the middle number. Does this generalise to larger spreads, or is it only relevant for a spread of 3? Well, lets see if we can come up with a formula for our chances for any spread.

To do this we’ll have to be a little more rigorous. First, let’s define the tactics of two players. Player 1 always picks the lowest number in the available range. Player 2 always picks the middle number of the range (for an odd spread, there is a single “middle” number for them to pick. For an even spread there are two options, so let’s say they pick the smaller of the two equally “middle” values).

We want to calculate a function f(n) which gives the probability of Player 1 winning if they go first in a game which starts with a spread of n. The first few cases are easy:

f(1) = 1 (if there’s only one choice, you’d better win!)
f(2) = \frac{1}{2} (as we proved above.)
f(3) = \frac{2}{3} (also shown above.)

Now, let’s see if we can come up with a recurrence relation which uses these base cases. Let’s start in the case where n is even. Player 1 picks their number, the smallest in the range. It’s either correct, and they win \left(\frac{1}{n}\right), or it’s incorrect \left(\frac{n-1}{n}\right) and Player 2 gets a turn with a spread of n-1. If Player 2 guesses correctly, \left(\frac{1}{n-1}\right), then Player 1 loses. If Player 2 guesses incorrectly \left( \frac{n-2}{n-1}\right) then Player 1 will get another crack at a spread of \frac{n-2}{2}. Putting this all together gives

$latex
f(n) = \frac{1}{n} + \frac{n-1}{n} \frac{n-2}{n-1}f\left(\frac{n-2}{2}\right)\newline
\,\,\, = \frac{1}{n}\left(1 + (n-2)f\left(\frac{n-2}{2}\right)\right).
$

In the case where n is odd, there is still a \frac{1}{n} chance of Player 1 winning on that turn and an \frac{n-2}{n} chance of Player 2 getting a chance and missing, but now if Player 2 gets it wrong things are a little tricky. Player 2 faced an even spread, so picked one of two equally good “middle” values, but if he’s wrong then there are two different spreads that Player 1 might face on their next turn. Out of the n-2 possibly correct values which Player 2 didn’t pick, \frac{n-1}{2} will be on one side of the number he did pick (leaving Player 1 with a spread of \frac{n-1}{2}) and \frac{n-1}{2} - 1 will be on the other side (leaving Player 1 with a spread of \frac{n-1}{2} - 1).

So, we put all this together and get (in the odd case)

$latex
f(n) = \frac{1}{n} + \frac{n-2}{n} \frac{1}{n-2} \left(\frac{n-1}{2}f\left(\frac{n-1}{2}\right) + \left(\frac{n-1}{2} – 1\right)f\left(\frac{n-1}{2} – 1\right) \right)\newline
= \frac{1}{n}\left(1 + \frac{n-1}{2}f\left(\frac{n-1}{2}\right) + \left(\frac{n-1}{2} – 1\right)f\left(\frac{n-1}{2} – 1\right) \right).
$

Plugging in n=3 we get

$latex
f(3) = \frac{1}{3}\left(1 + f(1) + 0 \right)\newline
= \frac{2}{3},
$

which is consistent with our original argument.

So now lets apply this to the pub trivia scenario, where we begin with a range of 99. Expanding out the terms we need to calculate and then back substituting gives

$latex
f(99) = \frac{1}{99}(1 + 49f(49) + 48f(48))\newline
f(49) = \frac{1}{49}(1 + 24f(24) + 23f(23)\newline
f(48) = \frac{1}{48}(1 + 46f(23))\newline
f(24) = \frac{1}{24}(1 + 22f(11))\newline
f(23) = \frac{1}{23}(1 + 11f(11) + 10f(10))\newline
f(11) = \frac{1}{11}(1 + 5f(5) + 4f(4))\newline
f(10) = \frac{1}{10}(1 + 8f(4))\newline
f(5) = \frac{1}{5}(1 + 2f(2) + 1f(1))\newline
f(4) = \frac{1}{4}(1 + 2f(1))\newline
= \frac{3}{4}\newline
f(5) = \frac{3}{5}\newline
f(10) = \frac{7}{10}\newline
f(11) = \frac{7}{11}\newline
f(23) = \frac{15}{23}\newline
f(24) = \frac{15}{24}\newline
f(48) = \frac{31}{48}\newline
f(49) = \frac{31}{49}\newline
f(99) = \frac{63}{99}.
$

So in the pub trivia situation Player 1 would beat Player 2 on 63 out of every 99 games.

This is reasonably solid improvement, and demonstrates that using this tactic could definitely help you to win more prizes at pub trivia. The obvious disadvantage of course is that everyone in the pub will think you’re a bit of a dick :-)

Pentagons with Rational Trigonometry

The other day I wanted to work out the properties of a pentagon, for reasons which I might go into in a future post. In particular I wanted to know the horizontal and vertical dimensions of a pentagon with unit length “spokes”.

A started scratching out diagrams and end up with a mess of lines and stupid values like sin\left(\frac{\pi}{5}\right). It all quickly because a mess and gave me no precise numbers. Everything was in terms of trig functions and pi. Don’t get me wrong, pi’s a great number to work with if you’ve got circles, but I’m just dealing with straight lines here, I shouldn’t need transcendental numbers!

So, I reverted back to good old rational trigonometry. Rational trigonometry uses the concept of spread to measure the angle between lines, and quadrance to measure the extent between points. If the angle between two lines is \theta then the spread, s, is equal to \sin^2\theta. The quadrance between two points is just the distance squared. It turns out that using these measures makes calculations much simpler than in traditional trigonometry.

The diagram below shows a regular pentagon with unit length (and thus unit quadrance) spokes. We shall use rational trigonometry to compute exact values for all the labels shown.

The first (and most complex) step is to compute s_1 and s_2. We note that s_1 is one fifth of a full revolution and s_2 is one fifth of a half revolution. Both a half and a full revolution represent a spread of zero. Therefore s_1 and s_2 are both solutions of the 5th order spread polynomial,

S_5(s) = s(16s^2 - 20s + 5)^2 = 0.

We use the quadratic equation to find the two non-zero solutions to this equation and find

 s_1 = \frac{5 + \sqrt{5}}{8}\\  s_2 = \frac{5 - \sqrt{5}}{8}.

A corollary of the triple spread law when applied to right angled triangles gives

s_3 = 1 - s_2 = \frac{3 + \sqrt{5}}{8}.

By the spread law we can now directly find

 Q_2 = s_2 = \frac{5 - \sqrt{5}}{8}\\  Q_3 = s_3 = \frac{3 + \sqrt{5}}{8}\\  Q_4 = s_1 = \frac{5 + \sqrt{5}}{8}\\  Q_5 = 1 - s_1 = \frac{3 - \sqrt{5}}{8}.

Applying the triple quad formula to a bisected line we get

Q_1 = 4Q_2 = \frac{5 - \sqrt{5}}{2}.

Pythagoras’ theorem now gives us

Q_6 = Q_1 - Q_4 = \frac{15 - 5\sqrt{5}}{8} = 5Q_5.

Once again applying the spread law we get

 Q_7 = \frac{s_3}{s_1} = \frac{5 + \sqrt{5}}{10} = \frac{4}{5}Q_4\\  Q_8 = \frac{1 - s_1}{s_1} = \frac{5 - 2\sqrt{5}}{5}.

So, there we have it, all the interesting dimensions of a pentagon without once having to resort to transcendental numbers or functions.

If you want to know more about rational geometry I strongly suggest watching this series of short lectures by Norman Wildberger.

Sorting a deck of cards the fast way

I have a deck of cards on my desk at work, which I play with to keep my hands busy while thinking about tricky problems. I sometimes find myself trying to implement different sorting algorithms to put the deck in order and to no great surprise, they all tend to suck.

The algorithmically fast sorts, like quick-sort or merge-sort, require repeated splitting of the deck until you essentially have all 52 cards laid out in front of you, which isn’t very practical. The algorithmically slow ones, like insertion, selection and bubble sort, let you keep all the cards in your hands, but are held back by their fundamental complexity.

In practice to sort a deck of cards I’ll usually do a bucket + human sort: deal the cards out into four buckets, one for each suit, and then pick up the 13 cards for each suit and sort  them in hand, using no fixed algorithm. This tends to be fast in practice, but lacks a certain elegance. It doesn’t scale from a deck of cards to an arbitrary set of numbers to be sorted on a computer.

An ideal sorting algorithm would be a) easy to implement with a physical deck of cards and b) algorithmically fast (i.e. O(n*log(n))) when thought of abstractly.

After endless noodling around I stumbled on the following algorithm which neatly fits both criteria. I was pleasantly surprised when I did, because usually when I think I’ve found a new sorting algorithm I think a bit harder and it turns out I’ve just reinvented quick-sort! So, here’s my shiny new* algorithm.

Phase 1: Dealing

  1. Deal the first card from the deck face up on the table. This is the first pile.
  2. Take the next card from the deck. If it is smaller than the card in the first pile, place it on top of the first pile. If it’s larger than the card in the first pile, create a second pile to the right of the first pile.
  3. Deal each subsequent card, placing it on the pile which has the smallest top card which is larger than the card being dealt. If the top card of all piles is smaller than the card being dealt, create a new pile to the right of the other piles.

Phase 2: Collecting

  1. Pick up the smallest card. It will be on top of the left-most pile.
  2. Pick up the next smallest card, it will be on top of one of the piles.
  3. Continue picking up cards in order. The next card will always be on top of one of the piles.
  4. Play a game of cards!

So, let’s look at how it works. The key is in the invariant properties which we maintain. Each time we deal a new card in Phase 1, the piles maintain two invarients:

  1. Each pile is sorted (low→high, top→bottom)
  2. The top cards of the piles are sorted (low→high, left→right)

If you look at the dealing phase, you’ll notice we always place cards either on top of a larger card or into a new pile of their own (Property 1). We also make sure we place each card on the smallest possible top card, which maintains Property 2.

It turns out that for manual sorting with a deck of cards, Property 2 is not so important, so we break it in the collection phase, but we’ll come back to it in the algorithmic analysis.

Property 1 is what lets us do the collection phase quickly. Since each pile is sorted, the smallest remaining card must be on top of one of the piles. If it was below another card then that pile would not be sorted! So, we know that we can always find the next card and we can quickly pick up all the cards in order.

Practical complexity

In practice I’ve found I generally end up with 8 or 9 piles. This is a small enough number that working out where to deal each card is very quick, as is finding which card to pick up next in the collection phase. The end result is that each card is only handled twice; once when we deal it and once when we collect it. This means the algorithm is effectively O(n), given the physical movement of cards totally dwarfs the ~O(n*log(n)) visual processing steps required.

Algorithmic complexity

Now let’s look at how this method performs in a purely algorithmic context, and make sure it is indeed O(n*log(n)). We’ll deal with each phase separately.

In the dealing phase, for each card we need to do an insertion into a sorted list (i.e, the list of top cards on the piles). Using a binary search each insertion can be done in log(k) steps, where k is the number of piles. The worst case is a deck sorted smallest to largest which will result in a new pile for each card. So in the worst case we have k=n, and we need to do n insertions, leading to O(n*log(n)) operations. In the general case we have O(n*log(k)) where k is the number of piles.

In the collection phase we need to be a little bit careful and add one extra step. When working manually we break the left→right sorting (Property 2) as we pick up cards. This isn’t a problem, since we can visually find the next smallest card even if the top cards are unsorted. Algorithmically if we broke Property 2 the collection phase would require a search through k elements to pick up each card, leading to O(n*k) performance, so we need to maintain Property 2 in log(k) time at each step.

If we maintain Properties 1 and 2 then the smallest card will always be on the left-most pile. So, algorithmically, we take this card. This exposes a new top card on the left-most pile. To maintain Property 2 we now take this pile and insert it amongst the other piles to keep the left→right ordering. This operation is again an insertion into a sorted list, which can be done in log(k) operations. This means our modified collection process has O(n*log(k)) complexity. The insertion step is crucial algorithmically, but is not worth the physical cost when manually sorting cards.

So, both phases can be done in O(n*log(k)) steps, with a worst case of O(n*log(n)). So it’s algorithmically fast!

Conclusion

In summary, this algorithm is super efficient for manually sorting a deck of cards, requiring exactly 2 handlings of each card. It also matches other fast sorting algorithms in a purely abstract sense by having a worst case complexity of O(n*log(n)).

So next time you need to sort a deck of cards, whip out this trick and impress you friends with how quickly and efficiently you can put things in order. No doubt they’ll be extremely jealous of your mad skills!

* It’s new to me anyway. No doubt someone’s invented this before me, so if you’ve seen it and know what it’s called please let me know, I havn’t bothered to go hunting for it myself.