Sequences and iterables — ``selkie.pyx.seq`` ============================================ List-producing functions ------------------------ The functions ``as_list()``, ``repeatable()``, ``concat()``, ``unique()``, and ``cross_product()`` produce lists as output. **As list.** The function ``as_list()`` converts any item *x* to a list. If *x* is ``None``, it returns the empty list. If *x* is a list, it returns *x* itself. If *x* is a sequence, it returns ``list(*x*)``. If *x* is has attribute "``next``," it takes *x* to be a generator and returns ``list(*x*)``. Otherwise, it returns ``[*x*]``. **Repeatable.** A generator can only be used once, whereas iterables such as lists, tuples, sets, and dicts can be iterated over multiple times. The function ``repeatable()`` converts a generator into a list, but leaves other iterables alone. It coerces ``None`` to the empty list, but otherwise signals an error if its input is not an iterable. It assumes that any object with a ``next`` attribute is a generator, and any object with an ``__iter__`` attribute is an iterable. **Unique.** The function ``unique()`` takes a list as input and produces a list with all duplicates removed. The list does not need to be sorted, nor do duplicates need to be adjacent to each other. The algorithm is naive (quadratic), so it is only appropriate for short lists. >>> from selkie.pyx.seq import unique >>> unique([4, 2, 4, 1, 2]) [4, 2, 1] **Cross product.** The function ``cross_product()`` takes a single argument, a list of lists, and produces the cross product of those lists as output. >>> from selkie.pyx.seq import cross_product >>> cross_product([['a', 'b'], [1, 2], [42]]) [('a', 1, 42), ('a', 2, 42), ('b', 1, 42), ('b', 2, 42)] Sorted lists ------------ The functions ``intersect``, ``union``, and ``difference`` expect sorted lists as input. Their behavior is unpredictable if they are given unsorted lists. >>> from selkie.pyx.seq import intersect, union, difference >>> x = [1,3,5,6,7] >>> y = [2,3,4,7,8] >>> intersect(x,y) [3, 7] >>> union(x,y) [1, 2, 3, 4, 5, 6, 7, 8] >>> difference(x,y) [1, 5, 6] >>> difference(y,x) [2, 4, 8] Queue ----- A ``Queue`` is a first-in first-out queue. The method ``write()`` inserts an element at the tail of the queue, and ``read()`` removes and returns the element at the head of the queue. It is implemented as a buffer with head and tail pointers. Initially the buffer is empty. If the tail is at the end of the buffer, new elements are appended to the buffer and the buffer grows. When the queue is empty, the head and tail are reset to 0. Space in the buffer before the head is "wasted" space. If the wasted space exceeds a threshold (``maxwaste``), the contents of the queue are relocated so that the head is 0. One can specify ``maxwaste`` when creating the queue; by default it is 10. Setting ``maxwaste`` to ``None`` prevents the contents from being relocated (though the head and tail will still be reset to 0 if the queue becomes empty). The elements in the queue can be accessed and set by index. Edit distance ------------- Edit distance works with sequences generally, not just strings. To create and use an edit distance function: >>> from selkie.pyx.seq import EditDistance >>> distance = EditDistance() >>> distance('testy', 'tezt') 2 By default, it uses the function ``simple_distance()``, which imposes a cost of 1 for each insertion, substitution, or deletion. One can provide a new cost function when instantiating EditDistance. A cost function should take two arguments, *x* and *y*, representing a deletion if *y* is None, and insertion if *x* is None, and a substitution, if neither is None. The return value should be a number, possibly math.inf. Generators ---------- The following functions are provided that relate to generators: ``chain()``, ``nth()``, ``head()``, ``tail()``, ``more()``, ``product()``, ``count()``, and ``counts()``. For the purpose of illustration, let us define a little generator: >>> def pots (): ... for i in range(11): ... yield 2**i ... >>> list(pots()) [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024] **Chain.** The function ``chain()`` is imported from ``itertools``. It concatenates multiple generators. **Nth.** The function ``nth()`` returns a particular item from an iterator. >>> from selkie.pyx.seq import nth >>> nth(pots(), 2) 4 Remember that an iterable is consumed as one iterates through it. In the example just given, we created a new generator by calling ``pots()``. If we use its value, though, we need to be careful: >>> iter = pots() >>> nth(iter, 2) 4 >>> list(iter) [8, 16, 32, 64, 128, 256, 512, 1024] Note that ``nth()`` consumed the first three items. One use of ``nth`` is to jump to problematic cases in a large iteration. An idiom for finding such cases in the first place is the following:: for i, x in enumerate(myiteration): if isproblematic(x): return i **Head, tail.** The functions ``head()`` and ``tail()`` are also provided for inspecting parts of a large iterable. >>> from selkie.pyx.seq import head, tail >>> head(pots()) [1, 2, 4, 8, 16] >>> tail(pots()) [64, 128, 256, 512, 1024] An optional argument specifies how many items one would like to have: >>> head(pots(), 3) [1, 2, 4] >>> tail(pots(), 3) [256, 512, 1024] A more general function is ``islice``, from the standard ``itertools`` module. >>> from itertools import islice >>> list(islice(pots(), 2, 5)) [4, 8, 16] **Product.** The function ``product()`` is analogous to ``sum()``. It takes an iterable containing numbers, and returns the product of the numbers. **Count.** The function ``count()`` is analogous to ``len()``, except that it works for generators as well as lists and other iterables. >>> from selkie.pyx.seq import count >>> count(pots()) 11 Note that ``count`` is unrelated to ``itertools.count()``. The latter returns an infinite iterator that generates the natural numbers. **Counts.** The function ``counts()`` creates a table of counts of occurrences. >>> from selkie.pyx.seq import counts >>> tab = counts('abracadabra') >>> sorted(tab.items()) [('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)] **Dups.** ``dups(iter)`` returns an iteration over the duplicates in *iter*. Module Documentation -------------------- .. automodule:: selkie.pyx.seq Coercion ........ .. autofunction:: as_list .. autofunction:: single .. autofunction:: repeatable Sequence operations ................... .. autofunction:: concat .. autofunction:: unique .. autofunction:: cross_product .. autoclass:: EditDistance Sorted lists ............ .. autofunction:: uniq .. autofunction:: intersect .. autofunction:: union .. autofunction:: difference Queue and Index ............... .. autoclass:: Queue .. autoclass:: Index Data processing ............... .. autofunction:: nth .. autofunction:: head .. autofunction:: tail .. autofunction:: product .. autofunction:: count .. autofunction:: counts Mixins ...... .. autoclass:: LazyList