Prerequisites: The power of the state and The way of lambda

We fortified our proc_seq with tuples to keep track of multiple state variables. We scan through a sequence of numbers that proceeds according to a given rule, keeping track of various properties of the sequence, like what is the largest number seen so far according to a given criterion, or what is the sum of all numbers seen so far. We are able to scan through the sequence in pairs, triples, or more, and can keep record of multiple properties of the sequence. We can add as many state variables as we want, but these variables can hold only single values.

We are still unable to handle the task of mapping a positive integer to the Collatz sequence it seeds. To do that, we need a state variable that can hold a sequence of numbers, not just a single number. Lists exist for this purpose. Here is a brief interlude on lists.


Construction and indexing of lists is similar to tuples:

>>> x = [1, 2, 3, 4, 5]
>>> x
[1, 2, 3, 4, 5]
>>> x[0]  # first element
1
>>> x[3]	# fourth element
4
>>> x[-1]  # last element
5
>>> x[1:]  # all but the first element
[2, 3, 4, 5]

Lists are mutable, meaning that we can change them in place. Here is how:

>>> x = [1, 2, 3, 4, 5]
>>> x
[1, 2, 3, 4, 5]
>>> x.append(7)
>>> x
[1, 2, 3, 4, 5, 7]
>>> x.insert(2, 6)  # insert 6 at index 2
>>> x
[1, 2, 6, 3, 4, 5, 7]
>>> x.extend([8,9])  # extend by another list
>>> x
[1, 2, 6, 3, 4, 5, 7, 8, 9]

Operations like .insert, .extend, .append, and so on, are not expressions. They do now have any return value, but have side-effects that modify the list in place. Operations like these cause too much pain in programs, causing errors usually very hard to detect. We are not fond of side-effects, we like expressions that return values. Here are the expression equivalents of the above operations:

>>> x = [1, 2, 3, 4, 5]
>>> x
[1, 2, 3, 4, 5]
>>> x + [7]  # concatenation
[1, 2, 3, 4, 5, 7]
>>> x[:2] + [6] + x[2:]  # insert 6 at index 2
[1, 2, 6, 3, 4, 5]
>>> x + [8,9]  # extend by another list
[1, 2, 3, 4, 5, 8, 9]
>>> x
[1, 2, 3, 4, 5]

As you see, the list x remains unchanged after these operations. To actually change x, you need to assign the result back to x; unless you do that the original list remains unchanged.

Finally, here are some useful functions on lists:

>>> x = [1, 2, 3]
>>> len(x)  # length of the list
3
>>> 2 in x  # membership test
True

Back to Collatz sequences. Here is how we can use lists to construct the sequence seeded by a positive integer \(n\):

def collatz_sequence(n):
    """Return the Collatz sequence seeded by positive integer n.

    >>> collatz_sequence(6)
    [6, 3, 10, 5, 16, 8, 4, 2, 1]
    >>> collatz_sequence(11)
    [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
    """
    from collatz import collatz
    return proc_seq(
        n,
        seq=collatz,
        alive=lambda x: x > 1,
        update= lambda store, x: store + [x],
        init = []
    )

The post The power of the state closed:

There still remains a limitation in our notion of state which cannot be overcome by increasing the memory of the store component. You will find what it is in another post.

Now is time to make good on our promise.

Our sequence processor proc_seq keeps the sequence variable n and the store variable store separate. This separation does no harm under the assumption that the next item in the sequence depends only on the current item through a function of the users’ choice, passed to proc_seq via the seq parameter. But what if the next item in the sequence depends, perhaps among other things, on some information kept in the store parameter?

For an instance, take the task of generating 10 random numbers between 1 and 100, such that no number appears twice in the generated sequence. The update task should not be hard to come by:

lambda store, x: store + [x] if x not in store else store

But how will you know that you have generated 10 unique numbers, so that it is time to stop? All you can check with alive is the index parameter n. The problem is that alive and update cannot communicate. You might think this would work:

def unique_random_numbers(n):
    """Return a list of n unique random numbers between 1 and 100.

    >>> unique_random_numbers(10)
    [23, 5, 67, 89, 12, 34, 45, 78, 90, 11]
    """
    from random import randint
    return proc_seq(
        None,
        seq=lambda x: randint(1, 100),
        alive=lambda x: len(store) < n,
        update=lambda store, x: store + [x] if x not in store else store,
        init=[]
    )

but this will give an unbound variable error, because store is not defined in the scope of alive. alive is a parameter of proc_seq, and parameters cannot refer to variables in the body of the function, it works just the other way round.

There is only one way out of this impasse.1 You need a more powerful state, but this time the extra power does not come from memory capacity but the structure of the state. A more general state architecture opens the way to more flexible communication between state parameters.

We had very hard time to get used to proc_seq, and I guess, or hope, some of us has even started to like it, a bit. But it is time to say “good bye” to proc_seq. From here on we will have proc, which makes use of a single state variable that can be structured and manipulated in full flexibility:

def proc(state, alive, update):
    """Process as succession of states.

    state: initial state
    alive: one-place function that tests whether state is alive
    update: function that updates the state
    """
    while alive(state):
        state = update(state)
    return state 

Let’s have an example.

Return a list of n unique random numbers each between start and end. To tackle the problem, we need to design our state and how to update it. The design is fairly simple when we operate with a single state that serves as argument to both alive and update:

def n_uniq_random(n, start, end):
    """Return a list of n unique random numbers between start and end.

    >>> import random
    >>> random.seed(48)
    >>> n_uniq_random_dict(5, 1, 10)
    [9, 6, 3, 5, 4]
    """
    from random import randint
    from funcutils import proc

    def update(s):
        x = randint(start, end)
        return s + [x] if x not in s else s

    return proc(state= [],
                alive=lambda s: len(s) < n,
                update=update)

Let’s get back to the problem of counting the turning points in the Collatz sequence seeded by \(n\), and re-write it with proc:

def collatz_turning_points(n):
    """Return the number of turning points in the Collatz sequence seeded by n."""
    def update(state):
        count, two_back, one_back, current = state
        return (count + int(turning_point(two_back, one_back, current)),
                one_back,
                current,
                collatz(current))

    return proc(state=(0,n,n,n),
                alive=lambda s: s[3] != 1,
                update=update)[0]

Our state has 4 parameters in total and the meaning of each parameter depends on the position of the parameter in the 4-tuple. Even with 4 parameters, dealing with positions is error-prone. We are not good at remembering numbers. But we have no problem in understanding words. We will now see how to explicitly name our state parameters, through a brief interlude on dictionaries.

Dictionaries are mappings from keys to values. When we say mapping, we mean that each key is associated with one and only one value. Keys are usually strings, which can be formed by double or single quotes. Here is how to create and use dictionaries:

>>> d = {'a': 1, 'b': 2, 'c': 3}
>>> d
{'a': 1, 'b': 2, 'c': 3}
>>> d['a']  # value associated with key 'a'
1
>>> d['b'] = 5  # change value associated with key 'b'
>>> d
{'a': 1, 'b': 5, 'c': 3}

This much dictionary knowledge is enough for now.


We can now re-write collatz_turning_points using a dictionary state:

def collatz_turning_points(n):
    """Return the number of turning points in the Collatz sequence seeded by n."""
    return proc(state={"count": 0, "two_back": n, "one_back": n, "current": n},
                alive=lambda s: s["current"] != 1,
                update=lambda s: {"count": s["count"] + int(turning_point(s["two_back"],
                                                                            s["one_back"],
                                                                            s["current"])),
                                   "two_back": s["one_back"],
                                   "one_back": s["current"],
                                   "current": collatz(s["current"])})["count"]

You may find this too wordy in comparison to the tuple based solution. In the tuple version it is more likely that the indices get jammed. More importantly that sort of errors are hard to detect, as they might not cause an error during execution. Making errors in dictionary keys will be more likely to be detected by the interpreter, as “key not found” errors. But in the end, all this depends on the complexity of the problem/solution at hand. In sum, if wordiness pays back, be wordy, if it does not, go with the more concise tuple strategy.

  1. Two actually; you can also ditch proc_seq and shift to a while loop solution, but not in this course, sorry.