Friday, March 30, 2012

Project Euler 15

In problem 15 of Project Euler the problem of finding how many paths are possible on a 20x20 grid. To start us off project euler provides a very nice image of all of the possible paths through a 2x2 grid:

Hopefully it should be obvious how many paths there are, there are 6 paths that are possible. Surely there is a nice mathematical way to do this, but for now lets think about this as simply a problem to code your way out of. First off all we need to choose a data representation for this problem. It should be pretty obvious that the number of times the arrow goes 'east' for one square is the same as well as the number of times the arrow goes 'south' for one square. So we can then think about this problem as saying we need to make a certain number of 'east' moves and a certain number of 'south' moves. So when we start we have two choices, we can use a 'south' move or a 'east' move. So each time we still have at least one 'south' move and one 'east' move left we have to go down two different paths, but if we don't have any 'east' moves left there can only be one path to go down, same goes for 'south' moves. So we could write a function in python that looks like this:




def f(downs,lefts):
    if downs > 0 and lefts > 0:
        return f(downs - 1,lefts) + f(downs, lefts - 1)
    else:
        return 1

if __name__ == "__main__":
    print f(20,20)

This will solve the problem, but as the size of the grid gets bigger and bigger the number of times that we have to call the funciton increases expotentially. I am not even sure how long it would take to get the answer. But because this function is deterministic, meaning that it will always return the same output given the same inputs, we can use a techinique called memoization. The general plan for what we will do is look up in a hashmap the input every time we run the function and see if we have already run the function on those inputs, if we have we simply output what we got last time.
def memoize(f):
    cache = {}
    def g(x,y):
        if (x,y) not in cache:
            cache[(x,y)] = f(x,y)
        return cache[(x,y)]
    return g

@memoize
def f(downs,lefts):
    if downs > 0 and lefts > 0:
        return f(downs - 1,lefts) + f(downs, lefts - 1)
    else:
        return 1

if __name__ == "__main__":
    print f(20,20)

This code uses something called decorators. Basically what it does it every time the fuction f is called memoize's g function is called instead. So as you can see the g funtion first checks to see if the input is in a cache and then, if it is not, it runs the function and stores the result in the cache. This makes the program execute in less than 0.005 seconds. This is an amazing speed up given that I tried to run the first program when I started writing this blog post and it has still not finished.

Monday, March 26, 2012

Project Euler 37

Problem 37 of Project Euler has to do with finding the intersection of right and left truncatable primes. For example 73 is a prime and if you take off the left most digit, you will still have a prime. Same goes for the right most digit. The problem states that there are only 11 of these such primes. This sounds like an intriguing fact, however you quickly should come to the realization that this is highly dependent on the base in which you are counting.

Take, for instance, base 1, in which you simply make tally marks. So the first four primes would be:

II (2)
III (3)
IIIII (5)
IIIIIII (7)

Obviously there are zero left or right truncatable primes, assuming you do not count 1 as a prime. If you do there are 2. This should be pretty clear because when you truncate there is simply only one number it could be. If you did the same thing with base 2 (binary) then there are two numbers it could be, this should increase the chances of a prime being right truncatable. Here are the primes that are right truncatable (assuming that we stop when we hit 2 or 3):

1011 (11)
111 (7)
101 (5)

So as our intuition predicted, there are more right-truncatable primes here. This actually happens each time you increase the base in which you count. According to The Prime Glossary base 10 has 83 right truncatable primes, and base 42 has 175734. So the number of primes that are both right and left truncatable in a given base is simply somewhat arbitrary so I will not go further into depth about that.

My first attempt at solving this problem approaches it simply by writing a function that determines if a prime is right and left truncatable and simply iterating over all of the primes until I find 11 that pass the function. This takes about 5 minutes and 30 seconds to run. The code first opens a file containing a large number of primes and stores it in an array. Then it simply iterates over every prime, checking with the 'truncatable' function to see if it should add it to the list. Hopefully this python code should be self explanatory.

with open("../primes") as f:
    primes = [x.strip('\n') for x in f.readlines()]

primes = primes[1:]
#1 is contained in the array, we should remove it
def truncatable(number):
    for i in range(1,len(number)):
        if not number[i:] in primes:
            return False
        if not number[:-i] in primes:
            return False
    return True

truncates = set()
for prime in primes:
    if len(prime) > 1:
        if truncatable(prime):
            truncates.add(prime)
    if len(truncates) == 11:
        break

print sum(map(int,truncates))
obviously this is not the most efficient way of doing this. I decided to redo the problem. This time I was going to create the list of right truncatable primes and a list of left truncatable primes. Then whenever they both contain a prime (the intersection of the two sets) I will know that that prime is both right and left truncatable. Creating the lists for left or right truncation is fairly easy. I simply start with the list of single digit primes and then add all of the possible additions to the respective sides of the numbers. Then I filter out all of the non-prime numbers I have obtained and pass on the current list to see if we found any both right and left truncatable primes. This approach works significantly better, as it searches a much smaller amount of numbers, instead of searching through all primes it simply search all of the numbers that could be truncated to a right or left truncatable prime. This approach only takes about 14 seconds, a huge difference in speed. A note on the implementation, if you know what a generator in python is, feel free to skip this. I used a special type of function in python. In the function you will find yield instead of return. I find the best way to think about this is that the function simply spits out the values as it would with return, but it keeps its place in its execution. None of its variables are reset and it simply waits to continue computing until the next values are asked for. If you look at the code and are still having problems understand what is happening here is a great resource on generators.
with open("../primes") as f:
    primes = [x.strip('\n') for x in f.readlines()]

def left_truncatable():
    numbers = ["2","3","5","7"]
    extenders = [str(x) for x in range(1,10)]
    while True:
        temp = []
        for number in numbers:
            for extender in extenders:
                temp.append(extender + number)
        numbers = filter(primes.__contains__, temp)
        yield numbers

def right_truncatable():
    numbers = ["2","3","5","7"]
    extenders = ["1","3","7","9"]
    while True:
        temp = []
        for number in numbers:
            for extender in extenders:
                temp.append(number + extender)
        numbers = filter(primes.__contains__, temp)
        yield numbers

   
finals = set()
rights = right_truncatable()
lefts = left_truncatable()
while len(finals) < 11:
    finals.update(set(rights.next()).intersection(lefts.next()))
    
print sum([int(x) for x in finals])

If any of this code is hard for you to read please leave a comment and I will attempt to resolve the confusion. As an addendum I decided to see if I could speed up the code any more by shrinking the number of primes I have to check against by see if they were simply the same length. This has the ability to speed up the code if the .__contains__ function checks every single element to see if the element it is looking for is there. However the added cost of pre-filtering the primes must have canceled out any gains I saw by only having to check a smaller list and simply added to the complexity of the code, something you should never want. Therefore I am omitting posting the code.

Wednesday, March 21, 2012

Project Euler 67

Project Euler 67 is about finding the maximal path through a series of nodes. In this case it is the sum of integers in a triangle, from top to bottom. This is a simpler version of the problem which the Viterbi Algorithm solves. The Viterbi Algorithm is used primarily in HMMs where you want to find a minimal path through K layers of N nodes, and every node can travel to any of the nodes in the following layers. There is an added complication of there being a transition cost, each different node has a cost of going to another node. A small example, with the transition costs being either 0, 1 or 2. Each depending on how far away the previous number is.

1 3 4 5
7 9 1 2
6 3 8 0

So to solve this the realization that only the preceeding nodes matter for any given node. Therefore we can simplify the problem by running one layer at a time and updating their weights such as:


1 3 4 5
7 9 1 2
6 3 8 0


4 = 3 + max(1+0 or 7+1 or 6+2)
11 = 9 + max(1+1 or 7+0 or 6+1)
5 = 3 + max(1+2 or 7+0 or 6+0)
1 4 4 5
7 11 1 2
6 5 8 0

8 = 4 + max(4+0 or 11+1 or 5+2)
etc
1 4 8 5
7 11 6 2
6 5 13 0

etc
1 4 8 12
7 11 6 8
6 5 13 7

So we can see here that 5 is the lowest possible value, and we only had to visit each node once!

The same can be applied to the simpler problem of the triangle. Again an example is provided below, this time finding the maximal path.
3
7 4
2 4 6
8 5 9 3
3
10 7
2 4 6
8 5 9 3
3
10 7
12 14 13
8 5 9 3
3
10 7
12 14 13
20 19 23 16
So here we can see that the maximal path is 23 any we only had to visit each node once. So now we have a plan, lets implement it in python.

First Implementation:
if __name__ == "__main__":
    numbers = []
    with open("triangle.txt") as f:
        for line in f.readlines():
            numbers.append(map(lambda x: int(x),line.split()))
    for i in range(1,len(numbers)):
        for j in range(len(numbers[i])):
            if j == 0: # on the left side of the triangle
                numbers[i][j] += numbers[i-1][j]
            elif j == len(numbers[i]) - 1: # on the right side of the triangle
                numbers[i][j] += numbers[i-1][j-1]
            else:
                numbers[i][j] += max(numbers[i-1][j-1],numbers[i-1][j])
    print max(numbers[-1])
This code runs in ~ 0.05 seconds. Very fast given that the brute force solution would take about 5 billion years.

So the program essentially first creates an array in which we will store all of the numbers, then we get to this like
          numbers.append(map(lambda x: int(x),line.split()))
This may be a bit confusing for people not accustomed to lambda functions or the map function.

Basically line.split(), if line is "12 5 29" will return ["12","5","29"]. So we want to transform each element from a string to an integer. map(function,list) will run function on each element in a list and return the resulting list. lambda here creates a temporary function that takes in a variable x and applies int(x) to it. This is an example of bad code, I have become too accustomed to using lambda functions each time I use map, so lambda functions are not needed in the slightest. An equivalent line of code would be:

          numbers.append(map(int,line.split()))
Going forward hopefully the code should be self explanatory, if not a bit ugly. I don't like to have so many if statements so I while perusing the forums I found a cleaner way to solve this problem.

if __name__ == "__main__":
    numbers = []
    with open("triangle.txt") as f:
        for line in f.readlines():
            numbers.append([0] + [int(x) for x in line.split()] + [0]) 
            #credit goes to ArhturDenture in the fourms for a more pythonic way
    for i in range(1,len(numbers)):
        for j in range(1,len(numbers[i]) - 1):
            numbers[i][j] += max(numbers[i-1][j-1],numbers[i-1][j])
    print max(numbers[-1])



So here I was able to not only pad the array so I no longer have to have the sanity checks, but I removed the lambda function and the map function. The run time for this is essentially cut in half (0.028 sec)

So first of all the triangle will now look something like this:

0 3 0
0 7 4 0
0 2 4 6 0
0 8 5 9 3 0


This will eliminate the need to check to see if we are on either side of the triangle, so long as we don't modify the sides the interior of the triangle will always be greater than zero, or it will not matter. Secondly we were able to both remove the lambda and the map functions by doing it in a more pythonic way. Basically when there is a for loop inside the brackets for a list the for loop will be executed and stored as a list. This will execute a lot faster as it does not need to either create a function because of the lambda function or execute the map function.