Monday, March 26, 2012

Project Euler 37

Problem 37 of Project Euler has to do with finding the intersection of right and left truncatable primes. For example 73 is a prime and if you take off the left most digit, you will still have a prime. Same goes for the right most digit. The problem states that there are only 11 of these such primes. This sounds like an intriguing fact, however you quickly should come to the realization that this is highly dependent on the base in which you are counting.

Take, for instance, base 1, in which you simply make tally marks. So the first four primes would be:

II (2)
III (3)
IIIII (5)
IIIIIII (7)

Obviously there are zero left or right truncatable primes, assuming you do not count 1 as a prime. If you do there are 2. This should be pretty clear because when you truncate there is simply only one number it could be. If you did the same thing with base 2 (binary) then there are two numbers it could be, this should increase the chances of a prime being right truncatable. Here are the primes that are right truncatable (assuming that we stop when we hit 2 or 3):

1011 (11)
111 (7)
101 (5)

So as our intuition predicted, there are more right-truncatable primes here. This actually happens each time you increase the base in which you count. According to The Prime Glossary base 10 has 83 right truncatable primes, and base 42 has 175734. So the number of primes that are both right and left truncatable in a given base is simply somewhat arbitrary so I will not go further into depth about that.

My first attempt at solving this problem approaches it simply by writing a function that determines if a prime is right and left truncatable and simply iterating over all of the primes until I find 11 that pass the function. This takes about 5 minutes and 30 seconds to run. The code first opens a file containing a large number of primes and stores it in an array. Then it simply iterates over every prime, checking with the 'truncatable' function to see if it should add it to the list. Hopefully this python code should be self explanatory.

with open("../primes") as f:
    primes = [x.strip('\n') for x in f.readlines()]

primes = primes[1:]
#1 is contained in the array, we should remove it
def truncatable(number):
    for i in range(1,len(number)):
        if not number[i:] in primes:
            return False
        if not number[:-i] in primes:
            return False
    return True

truncates = set()
for prime in primes:
    if len(prime) > 1:
        if truncatable(prime):
            truncates.add(prime)
    if len(truncates) == 11:
        break

print sum(map(int,truncates))
obviously this is not the most efficient way of doing this. I decided to redo the problem. This time I was going to create the list of right truncatable primes and a list of left truncatable primes. Then whenever they both contain a prime (the intersection of the two sets) I will know that that prime is both right and left truncatable. Creating the lists for left or right truncation is fairly easy. I simply start with the list of single digit primes and then add all of the possible additions to the respective sides of the numbers. Then I filter out all of the non-prime numbers I have obtained and pass on the current list to see if we found any both right and left truncatable primes. This approach works significantly better, as it searches a much smaller amount of numbers, instead of searching through all primes it simply search all of the numbers that could be truncated to a right or left truncatable prime. This approach only takes about 14 seconds, a huge difference in speed. A note on the implementation, if you know what a generator in python is, feel free to skip this. I used a special type of function in python. In the function you will find yield instead of return. I find the best way to think about this is that the function simply spits out the values as it would with return, but it keeps its place in its execution. None of its variables are reset and it simply waits to continue computing until the next values are asked for. If you look at the code and are still having problems understand what is happening here is a great resource on generators.
with open("../primes") as f:
    primes = [x.strip('\n') for x in f.readlines()]

def left_truncatable():
    numbers = ["2","3","5","7"]
    extenders = [str(x) for x in range(1,10)]
    while True:
        temp = []
        for number in numbers:
            for extender in extenders:
                temp.append(extender + number)
        numbers = filter(primes.__contains__, temp)
        yield numbers

def right_truncatable():
    numbers = ["2","3","5","7"]
    extenders = ["1","3","7","9"]
    while True:
        temp = []
        for number in numbers:
            for extender in extenders:
                temp.append(number + extender)
        numbers = filter(primes.__contains__, temp)
        yield numbers

   
finals = set()
rights = right_truncatable()
lefts = left_truncatable()
while len(finals) < 11:
    finals.update(set(rights.next()).intersection(lefts.next()))
    
print sum([int(x) for x in finals])

If any of this code is hard for you to read please leave a comment and I will attempt to resolve the confusion. As an addendum I decided to see if I could speed up the code any more by shrinking the number of primes I have to check against by see if they were simply the same length. This has the ability to speed up the code if the .__contains__ function checks every single element to see if the element it is looking for is there. However the added cost of pre-filtering the primes must have canceled out any gains I saw by only having to check a smaller list and simply added to the complexity of the code, something you should never want. Therefore I am omitting posting the code.

4 comments:

  1. Part of the fun is the big increase in speed between the two coding methods. Is there a guiding principle behind the gain in speed?

    ReplyDelete
  2. I'll bet there is decent guess about how many truncatable primes there should be for any base b.

    ReplyDelete
  3. Unfortunately a long response was just lost, I will attempt to replicate it, but I may miss some of my points.

    First off I fully believe that there is a way to get a decent guess for the number of truncatable primes in a given base. I would personally approach this number probabilistically. First off there would be a probability given a number N that N its self is a prime. Given Prime number theorem http://en.wikipedia.org/wiki/Prime_number_theorem, the probability that that number N is prime is ln(N). Then we need to define a probability that at least one of the numbers that this number can 'become' by appending a number to the left or right of it is also prime. Lets concern our selves with left truncatable only because it has a nice property of the probabitility of the next primes being prime as more or less independent. So the probability that you can find a number to append that results in another prime number should be roughly ln(N*b)*b. This is because the number will be bigger by at least one order of magnitude (you could define this more rigorously, but that is beyond the scope of this blog) and there would be b different possible numbers to append to the end of it.

    Please correct me if I am wrong.

    ReplyDelete
  4. In reference to the speed up there is a very neat trick that I do to make it faster. Sorry if it was not clear what I did.

    Basically I have both reduced the amount of numbers that I have to search and the amount of computation I have to do for any number that I search of length n.

    For the first approach for any number I have to check all of its substrings to see if they are prime. That means I have to perform at least 2*n computations given n as the length of the prime.

    For the second approach I simply recognized that if I have a list of right truncatable primes I can discover all of the right truncatable primes of a length one longer. I do this by simply appending every possible ending (1,3,7,9) to every current known right truncatable prime of length n. Then I simply filter out all of the ones which are not prime. An example is shown below

    (2,3,5,7) starting list of primes
    (21,23,27,29,31,33,37,39,71,73,77,79,91,93,97,99) all of the possible candidates.
    (23,29,31,37,71,73,79,97) all of the non-primes filtered out

    This way once I get up to very large lengths I only have to do as many computations as there are numbers in my current list, which actually shrinks as we go on (the larger the number the lower the probability that it is prime).

    I hope this is a bit more clear. Please let me know if reading this then reading the code does not clear things up.

    ReplyDelete