Iteration by while
Let’s start with a simple function defined over integers called the Collatz function \(C\).
\[C(n) = \begin{cases} \frac{n}{2} & \text{if } n \text{ is even}\\ 3n + 1 & \text{if } n \text{ is odd} \end{cases}\]Collatz’ conjecture states that for any positive integer \(n\), repeated application of \(C\) will eventually reach 1. The Collatz sequence of a given positive integer \(n\) is the sequence of numbers obtained by repeatedly applying \(C\) starting from \(n\) until reaching 1. We will call \(n\) the seed of the sequence, and also use phrases like “a (Collatz) sequence seeded by \(n\)”, and so on. When we talk about the Collatz sequence of a number, we will include the starting number and the ending 1 in the sequence, this is good because by looking at the sequence, you immediately see how it started and that it is completed. Finally, we will use the phrase “Collatz sequence seeded by \(n\)” to refer to the Collatz sequence generated to reduce \(n\) to 1.
Check: Write the Collatz sequence seeded by 11.
One can be interested in many questions about Collatz sequence. For one:
Question: Given an upper bound \(k \geq 1\), find the seed \(n \leq k\) that produces the longest Collatz sequence.
The first question to ask about a programming problem is what are the inputs and outputs of the function we want to write. Here, the input is a positive integer \(k\), and the output is another positive integer \(n \leq k\) that produces the longest Collatz sequence among all integers from 1 to \(k\).
Once this global input-output relation is clear, the next step is to think about how to break this relation into smaller parts, or simpler problems that we can solve one by one to reach the final solution. At this point, a strategy that works quite well in most cases is to ask “What do I have to be able to do in order to solve this problem?”
Let’s list:
- Given a seed \(n\), we need to be able to compute the Collatz sequence seeded by \(n\).
- We need to be able to compute the length of a given sequence.
- We need to be able to compare lengths of multiple sequences and come up with the longest sequence.
- Finding the longest sequence is not enough, we also need to remember the seed that produced that sequence.
We can start with the first one. What is the input-output relation for this subproblem? Input is a positive integer \(n\), and output is the Collatz sequence seeded by \(n\), which is a list (or sequence) of integers. We have a problem here since we do not know yet constructing lists in Python. But we can at least print the numbers in the Collatz sequence one by one. Here is how we can do that using a while loop:
def print_collatz_sequence(n):
"""Print the Collatz sequence seeded by n."""
while n != 1:
print(n)
if n % 2 == 0: # checks whether n is even
n = n // 2
else:
n = 3 * n + 1
print(1) # Print the last element of the sequence
There is a very important principle in good programming that we need to follow as tightly as we can:
Principle: Each function should do one thing.
Think for a moment whether print_collatz_sequence follows this principle. It might appear to be so. But actually, it does two things. It computes the Collatz function for numbers within a loop, and it prints the numbers. That there are two separable tasks might not be obvious to you for the moment, but here is a simple test: What if we wanted to add the numbers in a Collatz sequence rather than print them on screen? That code would be very similar to the one we wrote. There would be the exact same or very similar lines for computing the next number in the sequence. The following bit:
if n % 2 == 0: # checks whether n is even
n = n // 2
else:
n = 3 * n + 1
The only difference would be that instead of printing the number, we would add it to a running total. This shows that our function is doing two things: computing the next number in the sequence, and printing it. Here is a useful principle:
Principle: Examining the function you wrote, if you discover blocks of code (like the if statement above) that might be useful for other tasks as well, separate that block into another function and use that function instead of the block.
In our example, this principle dictates us to write a separate function to apply the Collatz function:
def collatz(n):
"""The Collatz function
>>> collatz(0)
0
>>> collatz(3)
10
>>> collatz(8)
4
>>> collatz(-8)
-4
"""
if n%2==0:
return n//2 # // instead of / for an integer result
else:
return 3*n+1
and handle the printing of the sequence with:
def print_collatz_sequence(n):
"""Print the Collatz sequence seeded by n.
>>> print_collatz_sequence(8)
8
4
2
1
"""
while n != 1:
print(n)
n = collatz(n)
print(1) # Print the last element of the sequence
What we actually want is the length of the sequence, not its display on the screen. To get the length we will have to violate our principle of doing one thing at a time in a function. But this is only a temporary thing, once we learn how to form the sequences themselves, rather than their prints on the screen, we will get back to our principle. For now, counting has to be done like this:
def collatz_length(n):
"""The length of the Collatz sequence seeded by n (n and 1 are included)
>>> collatz_length(8)
4
>>> collatz_length(3)
8
"""
counter = 1
while n != 1:
counter += 1
n = collatz(n)
return counter
We seem to have all the ingredients we need to solve our original problem. But let’s do not rush. Let’s try to abstract a little away from our problem. The problem can be thought as a composition of two components: 1. There is a message source that generates numbers in order. You can also think of it as a stream, or a channel. 1. There is a listener (or receiver) that listens to the channel and keeps the record of the largest number received so far.
The generating source can vary from problem to problem, but the listener has a more general character. For example, the source could have been random integers between, say 1 and 1000000 (a million), and assume we were asked to find the largest such integer in \(k\) random integers. The code would look like:
def largest_random(k):
"""Return the largest of k ranndom integers between 1 and 1000000."""
from random import randint
largest_so_far = 0
counter = 1
while counter <= k:
r = randint(1, 1000000)
if r > largest_so_far:
largest_so_far = r
counter += 1
return largest_so_far
The function randint is imported from the random module, and generates a random integer between the two arguments (inclusive).
If our problem were to find the largest length rather than the seed that yields the largest length, we would have a very similar function:
def largest_collatz_length(k):
"""The largest Collatz length for the seeds between 1 and k."""
largest_so_far = 0
counter = 1
while counter <= k:
current_length = collatz_length(counter)
if current_length > largest_so_far:
largest_so_far = current_length
counter += 1
return largest_so_far
Ours is a little more complicated than this. We need to remember the seed that produced the largest length as well. Here is the final code:
def maximal_collatz_seed(k):
"""The largest seed between 1 and k that produces the largest Collatz length."""
largest_length = 0
seed_with_largest_length = 0
counter = 1
while counter <= k:
current_length = collatz_length(counter)
if current_length > largest_length:
largest_length = current_length
seed_with_largest_length = counter
counter += 1
return seed_with_largest_length
Once again we could not follow our principle of “one thing at a time” in the above function. The source component and listener component are fused together. In the earlier case we were helpless because we did not know how to construct sequences. If we knew we could have separated the construction and length components. The reason we are helpless in the current case is that we do not know about higher order functions – functions that take other functions as arguments. Once we learn about them, we will be able to separate the source and listener components.
Finally, notice that I wrote “The largest seed between 1 and k that produces the largest Collatz length” in the docstring. Why is that? Why did I say “the largest seed”?