Sequences and iterables — selkie.pyx.seq

List-producing functions

The functions as_list(), repeatable(), concat(), unique(), and cross_product() produce lists as output.

As list. The function as_list() converts any item x to a list.

If x is None, it returns the empty list. If x is a list, it returns x itself. If x is a sequence, it returns list(*x*). If x is has attribute “next,” it takes x to be a generator and returns list(*x*). Otherwise, it returns [*x*].

Repeatable. A generator can only be used once, whereas iterables such as lists, tuples, sets, and dicts can be iterated over multiple times. The function repeatable() converts a generator into a list, but leaves other iterables alone. It coerces None to the empty list, but otherwise signals an error if its input is not an iterable. It assumes that any object with a next attribute is a generator, and any object with an __iter__ attribute is an iterable.

Unique. The function unique() takes a list as input and produces a list with all duplicates removed. The list does not need to be sorted, nor do duplicates need to be adjacent to each other. The algorithm is naive (quadratic), so it is only appropriate for short lists.

>>> from selkie.pyx.seq import unique
>>> unique([4, 2, 4, 1, 2])
[4, 2, 1]

Cross product. The function cross_product() takes a single argument, a list of lists, and produces the cross product of those lists as output.

>>> from selkie.pyx.seq import cross_product
>>> cross_product([['a', 'b'], [1, 2], [42]])
[('a', 1, 42), ('a', 2, 42), ('b', 1, 42), ('b', 2, 42)]

Sorted lists

The functions intersect, union, and difference expect sorted lists as input. Their behavior is unpredictable if they are given unsorted lists.

>>> from selkie.pyx.seq import intersect, union, difference
>>> x = [1,3,5,6,7]
>>> y = [2,3,4,7,8]
>>> intersect(x,y)
[3, 7]
>>> union(x,y)
[1, 2, 3, 4, 5, 6, 7, 8]
>>> difference(x,y)
[1, 5, 6]
>>> difference(y,x)
[2, 4, 8]

Queue

A Queue is a first-in first-out queue. The method write() inserts an element at the tail of the queue, and read() removes and returns the element at the head of the queue.

It is implemented as a buffer with head and tail pointers. Initially the buffer is empty. If the tail is at the end of the buffer, new elements are appended to the buffer and the buffer grows. When the queue is empty, the head and tail are reset to 0.

Space in the buffer before the head is “wasted” space. If the wasted space exceeds a threshold (maxwaste), the contents of the queue are relocated so that the head is 0. One can specify maxwaste when creating the queue; by default it is 10. Setting maxwaste to None prevents the contents from being relocated (though the head and tail will still be reset to 0 if the queue becomes empty).

The elements in the queue can be accessed and set by index.

Edit distance

Edit distance works with sequences generally, not just strings. To create and use an edit distance function:

>>> from selkie.pyx.seq import EditDistance
>>> distance = EditDistance()
>>> distance('testy', 'tezt')
2

By default, it uses the function simple_distance(), which imposes a cost of 1 for each insertion, substitution, or deletion. One can provide a new cost function when instantiating EditDistance. A cost function should take two arguments, x and y, representing a deletion if y is None, and insertion if x is None, and a substitution, if neither is None. The return value should be a number, possibly math.inf.

Generators

The following functions are provided that relate to generators: chain(), nth(), head(), tail(), more(), product(), count(), and counts().

For the purpose of illustration, let us define a little generator:

>>> def pots ():
...     for i in range(11):
...         yield 2**i
...
>>> list(pots())
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]

Chain. The function chain() is imported from itertools. It concatenates multiple generators.

Nth. The function nth() returns a particular item from an iterator.

>>> from selkie.pyx.seq import nth
>>> nth(pots(), 2)
4

Remember that an iterable is consumed as one iterates through it. In the example just given, we created a new generator by calling pots(). If we use its value, though, we need to be careful:

>>> iter = pots()
>>> nth(iter, 2)
4
>>> list(iter)
[8, 16, 32, 64, 128, 256, 512, 1024]

Note that nth() consumed the first three items. One use of nth is to jump to problematic cases in a large iteration. An idiom for finding such cases in the first place is the following:

for i, x in enumerate(myiteration):
    if isproblematic(x):
        return i

Head, tail. The functions head() and tail() are also provided for inspecting parts of a large iterable.

>>> from selkie.pyx.seq import head, tail
>>> head(pots())
[1, 2, 4, 8, 16]
>>> tail(pots())
[64, 128, 256, 512, 1024]

An optional argument specifies how many items one would like to have:

>>> head(pots(), 3)
[1, 2, 4]
>>> tail(pots(), 3)
[256, 512, 1024]

A more general function is islice, from the standard itertools module.

>>> from itertools import islice
>>> list(islice(pots(), 2, 5))
[4, 8, 16]

Product. The function product() is analogous to sum(). It takes an iterable containing numbers, and returns the product of the numbers.

Count. The function count() is analogous to len(), except that it works for generators as well as lists and other iterables.

>>> from selkie.pyx.seq import count
>>> count(pots())
11

Note that count is unrelated to itertools.count(). The latter returns an infinite iterator that generates the natural numbers.

Counts. The function counts() creates a table of counts of occurrences.

>>> from selkie.pyx.seq import counts
>>> tab = counts('abracadabra')
>>> sorted(tab.items())
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]

Dups. dups(iter) returns an iteration over the duplicates in iter.

Module Documentation

Coercion

selkie.pyx.seq.as_list(x)

Converts x to a list.

  • None becomes the empty list.

  • Tuples, sets, frozensets, ranges, dicts, and iterables are passed to list(). (An iterable is anything with method __next__().)

  • Anything else is wrapped in a (single-element) list.

selkie.pyx.seq.single(x)

Takes an iterable and returns its sole element. Signals an error if the iterable does not contain exactly one element.

selkie.pyx.seq.repeatable(x)

The return value is something that has method __iter__().

  • If x is None: returns the empty list.

  • If x already has method __iter__(), it is returned.

  • If x has method __next__(), then iter(x) is returned.

  • Otherwise an error is signalled.

Sequence operations

selkie.pyx.seq.concat(lists)

Concatenates multiple lists.

selkie.pyx.seq.unique(elts)

Eliminates duplicates from a list. Otherwise preserves order.

selkie.pyx.seq.cross_product(lists)

Cross product.

class selkie.pyx.seq.EditDistance(cost=<function simple_cost>)

One optionally provides a loss function when instantiating EditDistance. The loss function should take two items and return 0 if they are identical and a number greater than 0 if not. The default is 0-1 loss. An instance of EditDistance is a function that returns the edit distance between two sequences.

Sorted lists

selkie.pyx.seq.uniq(sortedlst)

Eliminates duplicates, assuming that the list is sorted.

selkie.pyx.seq.intersect(list1, list2)

Intersects two sorted lists. Unpredictable results if the lists are not sorted.

selkie.pyx.seq.union(list1, list2)

Takes the union of two sorted lists. Unpredictable results if the lists are not sorted.

selkie.pyx.seq.difference(list1, list2)

Returns the set difference of two sorted lists. Unpredictable results if the lists are not sorted.

Queue and Index

class selkie.pyx.seq.Queue(maxwaste=10)

A queue. It uses a circular buffer. The write() method adds an object to the end of the queue. The read() method takes an object from the head of the queue.

class selkie.pyx.seq.Index(*pairs)

A specialization of dict in which keys are associated with lists of values instead of single values. Setting a value actually appends it to the list values. The method add() is available as a synonym. The method count(key) returns the number of values for the given key, and delete(key,val) deletes one of the values, leaving the others in place.

Data processing

selkie.pyx.seq.nth(iter, n)

Returns the n-th item of iterable g, counting from 0, and counting from the current position of the iterator’s “read head.” For example:

>>> from selkie.pyx.seq import nth
>>> import itertools
>>> c = itertools.count(0)
>>> nth(c, 3)
3
>>> nth(c, 3)
7

The iterator I{c} generates the natural numbers, beginning with 0. The first call to C{nth} returns the fourth item, which is the number 3. The second call begins where the first left off, and returns the fourth item, which is the number 7.

Note that one can achieve the same functionality this way:

>>> c = itertools.count(0)
>>> next(itertools.islice(c, 3, 4))
3

One use of nth is to jump to problematic cases in a large iteration. An idiom for finding such cases in the first place is the following:

for (i, x) in enumerate(myiteration):
    if isproblematic(x):
        return i
selkie.pyx.seq.head(iter, n=5)

Returns a list containing the first n elements from the iteration.

selkie.pyx.seq.tail(iter, n=5)

Returns a list containing the last n elements from the iteration.

selkie.pyx.seq.product(nums)

The product of a list of numbers.

selkie.pyx.seq.count(iter)

Counts up how many items are contained in (or remain in) the given iterable.

selkie.pyx.seq.counts(iterable)

Creates a map whose keys are items in the given iterable, and whose value for a given item is the frequency of occurrence of that item in the iteration. All items are consumed.

Mixins

class selkie.pyx.seq.LazyList(iterable)

An abstract class. Its __init__() method requires an iterable. Elements will be fetched from the iterable only on demand, so one should be cautious about changing the underlying element source during the lifetime of the LazyList.