Remember our earlier discussion of Collatz function and sequences. Here is our task:

Task: Given a positive integer \(n\), find the largest integer in the Collatz sequence seeded by \(n\).1

By now, the task should be a piece of cake for you. Is it? Try and see before proceeding.

Here we go:

def collatz(n):
    """The Collatz function """
    if n%2==0:
        return n//2     # // instead of / for an integer result
    else:
        return 3*n+1

def largest_collatz(n):
    """Return the largest number in the Collatz sequence seeded by n.

    >>> largest_collatz(8)
    8
    >>> largest_collatz(3)
    16
    """
    largest_so_far = n
    while n != 1:
        n = collatz(n)
        if n > largest_so_far:
            largest_so_far = n
    return largest_so_far

Remember our principle “do one thing at a time”. Does it hold for largest_collatz? What was our test? We ask ourselves does this function have parts that might be useful for other tasks? The function enters into a loop, has a specific way to get the next item in the sequence, and keeps track of the largest number seen so far. This appears to be a task we might want to perform for other sequences as well. For example, we could have the task to find the largest number in the aliquot sequence (read “AL-i-kwot”) of a number, where the next number in the sequence is obtained by summing the proper divisors of the current number.2 Like Collatz sequences, aliquot sequences also can fluctuate.3 Assume, although wrongly, that an aliquot sequence always ends at 1.

If our task was to find the largest number in an aliquot sequence, what would have changed in the above code? Just a single name! We would have aliquot instead of collatz, where the former would be the name of the function that computes the next number in the aliquot sequence. This shows that we can separate (1) the task of finding the largest number in a sequence from (2) the task of computing the next number in that sequence.

The function that computes the next aliquot number is an exercise you need to tackle. But let’s assume we already have a function next_aliquot:

def next_aliquot(n):
    """Return the next number in the aliquot sequence seeded by n.

    >>> next_aliquot(12)
    16
    >>> next_aliquot(15)
    9
    """
    # Assume the function is properly defined and usable 
    pass

You are about to pass a milestone in your programming journey, read carefully. To separate the task of finding the largest in a sequence from the choice of a particular sequence, we abstract the sequence function from our largest_collatz function. This is done by replacing the name collatz in our function with the name sequencer (any name will be fine) and make sequencer a parameter of our function. Here is how:

def largest_in_sequence(n, sequencer):
    """Return the largest number in the sequence seeded by n using sequencer function.

    >>> largest_in_sequence(8, collatz)
    8
    >>> largest_in_sequence(3, collatz)
    16
    >>> largest_in_sequence(12, next_aliquot)
    16
    >>> largest_in_sequence(15, next_aliquot)
    9
    """
    largest_so_far = n
    while n != 1:
        n = sequencer(n)
        if n > largest_so_far:
            largest_so_far = n
    return largest_so_far

Of course we no longer call our function largest_collatz. With this function, you can now find the largest number in any sequence defined by a function that computes the next number in that sequence. You just need to pass the appropriate function as the second argument. If you want to find the largest number in a Collatz sequence seeded by 183, call largest_in_sequence(183, collatz), if you are interested in the largest number in an aliquot sequence seeded by the same number, call largest_in_sequence(183, next_aliquot).

There is big lie in the above paragraph. What is it?

Our function would work fine with Collatz sequences, and with some aliquot sequences, but not with any sequence. Because our function assumes that the sequence ends in 1. This does not have to be so in the general case. We have to free ourselves from that 1, you might think. And for someone who has started abstracting functions, abstracting a number should not be a problem: simply add a third parameter for stop_value:

def largest_in_sequence(n, sequencer, stop_value):
    """Return the largest number in the sequence seeded by n using sequencer function. """
    largest_so_far = n
    while n != stop_value:
        n = sequencer(n)
        if n > largest_so_far:
            largest_so_far = n
    return largest_so_far

Seems OK. But is it?

What if I wanted to stop when a number in my sequence becomes odd, prime or, say, one of the prime factors of 38592? These cases cannot be handled by passing a number to my function to be inserted to the right of !=. Any ideas?

Once again I need a more general, more abstract way to handle the problem. Why don’t I use a function parameter again, this time named, say, alive, which when applied to my iteration parameter \(n\) returns True if I need to continue, and False if I need to stop? Here is the updated code:

def largest_in_sequence(n, sequencer, alive):
    """Return the largest number in the sequence seeded by n using sequencer function. """
    largest_so_far = n
    while alive(n):
        n = sequencer(n)
        if n > largest_so_far:
            largest_so_far = n
    return largest_so_far

Now with this function and the function below:

def not_div_7_3(n):
    """Check whether n is NOT divisable by 3 and 7."""
    return not (n % 3 == 0 and n % 7 == 0)

I can find the largest number in the Collatz sequence seeded by 1000 that is encountered before we reach a number divisible by both 3 and 7, with the following call:4

>>> largest_in_sequence(1000, collatz, div_7_3)

Well, we already have gone a long way. Is it time to rest? One final check: Look again at our latest largest_in_sequence function. Does it do “one thing only”? Take your time; but you know, if it did, I wouldn’t ask this question.

It doesn’t. There is still room for abstraction. Again it might not be easy to see, but use our test: Does this function have parts that might be useful for other tasks? What would you need to change in this function if the task was to find the smallest number in a sequence rather than the largest? You might, rightly, think finding the smallest does not make sense in Collatz – why is that by the way? But assume the task is to find not the largest or smallest number in the sequence, but the number with the largest or smallest sum of digits. What would change in the code? Please think before proceeding. Try to come up with your own modification.

Here is another abstraction, this time the comparison bit has been abstracted into a function beats, which when applied to two numbers returns True, if the first number is “better” than the second according to some criterion, and False otherwise. Here is the final code:

def best_in_sequence(n, sequencer, alive, beats):
    """Return the 'best' number according to beats in the sequence seeded by n using sequencer function. """
    best_so_far = n
    while alive(n):
        n = sequencer(n)
        if beats(n, best_so_far):
            best_so_far = n
    return best_so_far

Assume you want to find the largest number in the Collatz sequence seeded by 176 whose sum of digits is even. Assuming you already solved the exercise of writing a function sum_of_digits that computes the sum of digits of a number, all you need is to define the following two functions:

def beats(x, y):
    """Check whether x is greater than y and its sum of digits is even."""
    return sum_of_digits(x) % 2 == 0 and x > y

def alive(n):
    """Check whether n is not 1."""
    return n != 1

and call:

>>> best_in_sequence(176, collatz, alive, beats)

A couple of notes are in order:

  1. The reason I defined the functions we will send as parameters to best_in_sequence with the names alive and beats is just that I couldn’t think of better names. You can name them whatever you like. In this current form, they are totally different from the same names we used in the definition of best_in_sequence. If this point is not perfectly clear to you, please ask me about it.
  2. When we say “do one thing at a time”, we do not rule out functions like our final beats function that do two things: compare two numbers and apply a criterion (like sum of digits even). This is because such functions are small enough and focused enough to be considered as doing one thing. There is no general use of parts of it for other tasks.
  3. Our alive seems a little idiotic to define, after all it just checks whether a number is not equal to 1. We have to do it in this ugly way for some time until we learn about lambda functions. When that shiny day arrives, the call to best_in_sequence would look like this:
    >>> best_in_sequence(176, collatz, lambda x: x!=1, lambda x,y: sum_of_digits(x)%2==0 and x>y)
    

and we will not have to define these small functions separately.

Well. Does the final version of our function best_in_sequence obey our principle “do one thing at a time”?

It very persuasively stands to be obeying the principle. One might say that best_in_sequence does one thing: find the best item in a given sequence using a given criterion. Incidentally, note that the sequence is not passed to our function as a single object but came in three pieces, namely the seed (or start), the sequencer function, and the alive function, which tells when the sequence ends. No harm for now, we promised to abstract away the source part, we didn’t promise that we will do it as a single parameter abstraction.

Let’s turn back to our question of whether best_in_sequence affords further abstractions. All we need is to apply our test and ask: do we listen to sources only to pick the best item according to some criterion? Or can there be other reasons why we listen to a source? What if we were asked to take the sum of all the integers encountered in reducing a seed to 1 by Collatz function? This is a slightly different problem than above, but cannot be solved by our function. We need a new function, perhaps something like this:

def sum_in_sequence(n, sequencer, alive):
    """Return the sum of numbers in the sequence seeded by n using sequencer function. """
    total = 0
    while alive(n):
        total += n
        n = sequencer(n)
    return total

What if we wanted the product of the numbers in the sequence? Mean? Geometric mean? Would you write separate functions for each? You can of course, but you should not, actually you need not. Let’s first get rid of our dependence on the particular summation business. We will have a store (or accumulator) that we will update at each turn of our loop, and the update will be done by a function, passed as a parameter, that will apply to the current item in the sequence and the current value of the store.

def accumulate_in_sequence(n, sequencer, alive, update):
    """Return the accumulated value in the sequence seeded by n using sequencer function. """
    acc = 0
    while alive(n):
        acc = update(acc, n)
        n = sequencer(n)
    return acc

Now if you want to add numbers in a Collatz sequence, have your update function and other parameters ready. Write the update function before proceeding.5

def add(acc, item):
  """Return the sum of acc and item."""
  return acc + item

def is_alive(n):
  """Return True if n is not 1."""
  return n != 1

def collatz(n):
  """Return the next number in the Collatz sequence."""
  if n % 2 == 0:
      return n // 2
  else:
      return 3 * n + 1

and call your function like this:

>>> accumulate_in_sequence(7, collatz, is_alive, add)

What if you want to find the largest number in the sequence? Is it doable with accumulate_in_sequence? Or do you need to use the good old best_in_sequence? Think.

It is perfectly doable, even fun to do. You do not even need to write a new update function. Simply pass Python’s max:

>>> accumulate_in_sequence(7, collatz, is_alive, max)

One final touch, in case it escaped your eye till now: our accumulate_in_sequence function initializes the store to 0. This is fine for summation, but not for other tasks. Make the initial state of your store a parameter. Here is the final version, with a more general name for processing sequences:

def proc_seq(n, sequencer, alive, update, init):
    """Return the accumulated value in the sequence seeded by n using sequencer function. """
    store = init
    while alive(n):
        store = update(store, n)
        n = sequencer(n)
    return store

Here is an important question that you can use to test your understanding of what is going on here, I mean regarding higher order functions and abstraction: How would you, now, solve the problem of finding the largest positive integer \(n\) with the largest Collatz length s.t. \(1 < n < k\) for a given \(k\)? I give you 1000 (\(k\)) and you return me the largest positive smaller integer with a Collatz length not smaller than any other you can find in the range from 1 to 1000.

With the power of higher-order functions, we gradually formed a general purpose, highly abstract function that can be used in an enormous variety of computational tasks.

There will surely be complaints about having a five parameter function. Given its power, five is not bad at all, but you can reduce it for most cases by keyword parameters, which is a topic for another day.

  1. Don’t you wonder whether there is a unique largest integer in a Collatz sequence? Is it possible to visit a number you’ve already visited in a Collatz sequence? 

  2. Do not confuse proper divisors with prime divisors of a number. Any positive divisor of \(n\) other than \(n\) itself is a proper divisor. Prime divisors of \(n\), on the other hand, are divisors of \(n\) that are prime. For example, 12 has the proper divisors 1, 2, 3, 4 and 6, and the prime divisors 2 and 3. What are the proper and prime divisors of 7? 

  3. Try 30 as seed. 

  4. Yes, I know this is a silly task. 

  5. You don’t need to write it actually; you can simply import it by from operator import add