Sequences and iterables — selkie.pyx.seq
List-producing functions
The functions as_list(), repeatable(), concat(), unique(),
and cross_product() produce lists as output.
As list.
The function as_list() converts any item x to a list.
If x is None, it returns the empty list.
If x is a list, it returns x itself.
If x is a sequence, it returns list(*x*).
If x is has attribute “next,” it takes x to be a
generator and returns list(*x*).
Otherwise, it returns [*x*].
Repeatable.
A generator can only be used once, whereas iterables such as lists,
tuples, sets, and dicts can be iterated over multiple times.
The function repeatable() converts a generator into a list, but
leaves other iterables alone. It coerces None to the empty
list, but otherwise signals an error if its input is not an iterable.
It assumes that any object with a next attribute is a generator,
and any object with an __iter__
attribute is an iterable.
Unique.
The function unique() takes a list as input and produces a list
with all duplicates removed. The list does not need to be sorted, nor
do duplicates need to be adjacent to each other. The algorithm is
naive (quadratic), so it is only appropriate for short lists.
>>> from selkie.pyx.seq import unique
>>> unique([4, 2, 4, 1, 2])
[4, 2, 1]
Cross product.
The function cross_product() takes a single argument, a list of
lists, and produces the cross product of those lists as output.
>>> from selkie.pyx.seq import cross_product
>>> cross_product([['a', 'b'], [1, 2], [42]])
[('a', 1, 42), ('a', 2, 42), ('b', 1, 42), ('b', 2, 42)]
Sorted lists
The functions intersect, union, and difference
expect sorted lists as input. Their behavior is unpredictable if they
are given unsorted lists.
>>> from selkie.pyx.seq import intersect, union, difference
>>> x = [1,3,5,6,7]
>>> y = [2,3,4,7,8]
>>> intersect(x,y)
[3, 7]
>>> union(x,y)
[1, 2, 3, 4, 5, 6, 7, 8]
>>> difference(x,y)
[1, 5, 6]
>>> difference(y,x)
[2, 4, 8]
Queue
A Queue is a first-in first-out queue. The method write()
inserts an element at the tail of the queue, and read() removes
and returns the element at the head of the queue.
It is implemented as a buffer with head and tail pointers. Initially the buffer is empty. If the tail is at the end of the buffer, new elements are appended to the buffer and the buffer grows. When the queue is empty, the head and tail are reset to 0.
Space in the buffer before the head is “wasted” space. If the
wasted space exceeds a threshold (maxwaste), the contents of the
queue are relocated so that the head is 0. One can specify maxwaste
when creating the queue; by default it is 10. Setting maxwaste
to None prevents the
contents from being relocated (though the head and tail will still be
reset to 0 if the queue becomes empty).
The elements in the queue can be accessed and set by index.
Edit distance
Edit distance works with sequences generally, not just strings. To create and use an edit distance function:
>>> from selkie.pyx.seq import EditDistance
>>> distance = EditDistance()
>>> distance('testy', 'tezt')
2
By default, it uses the function simple_distance(), which
imposes a cost of 1 for each insertion, substitution, or deletion.
One can provide a new cost function when instantiating
EditDistance. A cost function should take two arguments, x
and y, representing a deletion if y is None, and
insertion if x is None, and a substitution, if neither is
None. The return value should be a number, possibly math.inf.
Generators
The following functions are provided that relate to generators:
chain(),
nth(), head(), tail(), more(),
product(), count(), and counts().
For the purpose of illustration, let us define a little generator:
>>> def pots ():
... for i in range(11):
... yield 2**i
...
>>> list(pots())
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
Chain.
The function chain() is imported from itertools.
It concatenates multiple generators.
Nth.
The function nth() returns a particular item from an iterator.
>>> from selkie.pyx.seq import nth
>>> nth(pots(), 2)
4
Remember that an iterable is consumed as one iterates through it. In
the example just given, we created a new generator by calling pots().
If we use its value, though, we need to be careful:
>>> iter = pots()
>>> nth(iter, 2)
4
>>> list(iter)
[8, 16, 32, 64, 128, 256, 512, 1024]
Note that nth() consumed the first three items.
One use of nth is to jump to problematic cases in a large
iteration. An idiom for finding such cases in the first place is the
following:
for i, x in enumerate(myiteration):
if isproblematic(x):
return i
Head, tail.
The functions head() and tail() are also provided for
inspecting parts of a large iterable.
>>> from selkie.pyx.seq import head, tail
>>> head(pots())
[1, 2, 4, 8, 16]
>>> tail(pots())
[64, 128, 256, 512, 1024]
An optional argument specifies how many items one would like to have:
>>> head(pots(), 3)
[1, 2, 4]
>>> tail(pots(), 3)
[256, 512, 1024]
A more general function is islice, from the standard
itertools module.
>>> from itertools import islice
>>> list(islice(pots(), 2, 5))
[4, 8, 16]
Product.
The function product() is analogous to sum(). It takes an
iterable containing numbers, and returns the product of the numbers.
Count.
The function count() is analogous to len(), except that it
works for generators as well as lists and other iterables.
>>> from selkie.pyx.seq import count
>>> count(pots())
11
Note that count is unrelated to itertools.count(). The
latter returns an infinite iterator that generates the natural
numbers.
Counts.
The function counts() creates a table of counts of occurrences.
>>> from selkie.pyx.seq import counts
>>> tab = counts('abracadabra')
>>> sorted(tab.items())
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
Dups.
dups(iter) returns an iteration over the duplicates in iter.
Module Documentation
Coercion
- selkie.pyx.seq.as_list(x)
Converts x to a list.
None becomes the empty list.
Tuples, sets, frozensets, ranges, dicts, and iterables are passed to list(). (An iterable is anything with method
__next__().)Anything else is wrapped in a (single-element) list.
- selkie.pyx.seq.single(x)
Takes an iterable and returns its sole element. Signals an error if the iterable does not contain exactly one element.
- selkie.pyx.seq.repeatable(x)
The return value is something that has method
__iter__().If x is None: returns the empty list.
If x already has method
__iter__(), it is returned.If x has method
__next__(), theniter(x)is returned.Otherwise an error is signalled.
Sequence operations
- selkie.pyx.seq.concat(lists)
Concatenates multiple lists.
- selkie.pyx.seq.unique(elts)
Eliminates duplicates from a list. Otherwise preserves order.
- selkie.pyx.seq.cross_product(lists)
Cross product.
- class selkie.pyx.seq.EditDistance(cost=<function simple_cost>)
One optionally provides a loss function when instantiating EditDistance. The loss function should take two items and return 0 if they are identical and a number greater than 0 if not. The default is 0-1 loss. An instance of EditDistance is a function that returns the edit distance between two sequences.
Sorted lists
- selkie.pyx.seq.uniq(sortedlst)
Eliminates duplicates, assuming that the list is sorted.
- selkie.pyx.seq.intersect(list1, list2)
Intersects two sorted lists. Unpredictable results if the lists are not sorted.
- selkie.pyx.seq.union(list1, list2)
Takes the union of two sorted lists. Unpredictable results if the lists are not sorted.
- selkie.pyx.seq.difference(list1, list2)
Returns the set difference of two sorted lists. Unpredictable results if the lists are not sorted.
Queue and Index
- class selkie.pyx.seq.Queue(maxwaste=10)
A queue. It uses a circular buffer. The
write()method adds an object to the end of the queue. Theread()method takes an object from the head of the queue.
- class selkie.pyx.seq.Index(*pairs)
A specialization of dict in which keys are associated with lists of values instead of single values. Setting a value actually appends it to the list values. The method
add()is available as a synonym. The methodcount(key)returns the number of values for the given key, anddelete(key,val)deletes one of the values, leaving the others in place.
Data processing
- selkie.pyx.seq.nth(iter, n)
Returns the n-th item of iterable g, counting from 0, and counting from the current position of the iterator’s “read head.” For example:
>>> from selkie.pyx.seq import nth >>> import itertools >>> c = itertools.count(0) >>> nth(c, 3) 3 >>> nth(c, 3) 7
The iterator I{c} generates the natural numbers, beginning with 0. The first call to C{nth} returns the fourth item, which is the number 3. The second call begins where the first left off, and returns the fourth item, which is the number 7.
Note that one can achieve the same functionality this way:
>>> c = itertools.count(0) >>> next(itertools.islice(c, 3, 4)) 3
One use of nth is to jump to problematic cases in a large iteration. An idiom for finding such cases in the first place is the following:
for (i, x) in enumerate(myiteration): if isproblematic(x): return i
- selkie.pyx.seq.head(iter, n=5)
Returns a list containing the first n elements from the iteration.
- selkie.pyx.seq.tail(iter, n=5)
Returns a list containing the last n elements from the iteration.
- selkie.pyx.seq.product(nums)
The product of a list of numbers.
- selkie.pyx.seq.count(iter)
Counts up how many items are contained in (or remain in) the given iterable.
- selkie.pyx.seq.counts(iterable)
Creates a map whose keys are items in the given iterable, and whose value for a given item is the frequency of occurrence of that item in the iteration. All items are consumed.
Mixins
- class selkie.pyx.seq.LazyList(iterable)
An abstract class. Its
__init__()method requires an iterable. Elements will be fetched from the iterable only on demand, so one should be cautious about changing the underlying element source during the lifetime of the LazyList.