.. automodule:: selkie.nlp.tree Trees — ``selkie.nlp.tree`` =========================== Using Trees ----------- Creating a little tree:: from selkie.nlp.tree import Tree fido = Tree('Name', word='Fido') chases = Tree('V', word='chases') spot = Tree('Name', word='Spot') vp = Tree('VP', [chases, spot]) s = Tree('S', [fido, vp]) A category with features can be created this way:: from selkie.nlp.features import Category, atomset cat = Category(['np', 'fem', atomset(['du','pl'])]) The Tree class -------------- .. py:class:: selkie.nlp.tree.Tree The nodes of a tree are represented by instances of the class ``Tree``. There is no separate node class: a node and the tree rooted at the node are both represented by a ``Tree`` instance. .. py:attribute:: word A string or None. .. py:attribute:: children A list or None. May be ``[]``. .. py:attribute:: nld Number of left dependents. .. py:attribute:: role This node's role. .. py:attribute:: cat Category. .. py:attribute:: sem Semantic translation or None. .. py:attribute:: id An identifier. .. py:attribute:: parent The parent node, if ``set_parents()`` was done. .. method:: __init__([cat], [children], **kwargs) Members can be specified as keywords. .. method:: copy(**kwargs) Members can be specified as keywords. .. method:: __getitem__(i) I-th node in preorder walk. .. method:: __iter__() Preorder walk. .. method:: __str__() Pretty-printed. Basic node types ---------------- We wish to accommodate the nodes that occur in three kinds of trees: unheaded phrase-structure trees, headed phrase-structure trees, and dependency trees. The attributes ``word``, ``children``, and ``role`` are used to distinguish among node types. **Interior and leaf.** The ``children`` attribute distinguishes between *interior* nodes and *leaf* nodes. The former have children; the latter do not. Governors and phrasal nodes are interior nodes. An interior node is *headed* if one of its children has the role ``'head'``, and it is *unheaded* otherwise. **Lexical and nonlexical.** A node that has a (boolean True) value for ``word`` is *lexical*; otherwise the node is *nonlexical*. Taking the cross product yields the following five node types: +-----------------+-------+---------------------------------+ | | Leaf | Interior | | | +---------------+-----------------+ | | | Headed | Unheaded | +=================+=======+===============+=================+ | **Nonlexical**: | empty | headed phrase | unheaded phrase | +-----------------+-------+---------------+-----------------+ | **Lexical**: | word | governor | +-----------------+-------+---------------------------------+ For a governor, a child with role ``'head'`` has no special status. **Terminal and nonterminal.** The interior-leaf distinction is *not* the same as the terminal-nonterminal distinction. The latter is a property of categories, as determined by a grammar. Terminal categories are not allowed to appear on the lefthand side of a rewrite rule, whereas nonterminal categories that are not useless appear on the lefthand side of at least one rewrite rule. It is possible to have leaf nodes with nonterminal categories: such nodes are *null expansions.* A tree generated by a constituent-structure grammar cannot have interior nodes labeled with terminal categories, but dependency grammars make no terminal-nonterminal distinction, and permit trees in which interior nodes are labeled with parts of speech, which would be terminal categories in a constituent-structure grammar. **Leaf words and empty leaves.** The ``word`` attribute distinguishes between leaf words and empty leaves. The former have a non-null value for ``word``, and the latter do not. Note that the expression *leaf word* is not redundant in a dependency tree: leaf words contrast with *governors,* which are interior nodes with a value for ``word``. However, in a constituency tree, only leaf nodes have values for ``word``, so in that context we can refer to leaf words simply as *words.* Depending on the kind of category it has, an empty leaf may represent either a null terminal (like an empty complementizer) or a null expansion (corresponding to a rewrite rule with nothing on the right-hand side). We are careful not to refer to null terminals as "words," reserving the term *word* for a node with a non-null value for ``word``. **Governor versus phrase.** The ``word`` attribute also distinguishes between governors and phrases. Both are interior nodes; the former has a non-null value for ``word``, while the latter does not. That is, a *governor* is a node that has non-null values for both ``children`` and ``word``. Governors are used in dependency trees; their children are called *dependents.* By contrast, a *phrase* or *phrasal node* has children but no word. Phrasal nodes are used in constituency trees. **Heads.** We further subdivide phrasal nodes according to whether they have heads or not. The head of a phrasal node is defined to be a child whose ``role`` is "``head``." A phrasal node with a head is a headed phrase, and a phrasal node without a head is an unheaded phrase. Governors and leaves are by definition headless. Other attributes ---------------- **Number of left dependents.** The ``nld`` attribute is relevant only for governors. It indicates the number of left dependents. In the terminal string, the governor is ordered after its left dependents and before its right dependents. If a phrasal node or leaf has a value for ``nld``, the value is ignored. **Parent.** The ``parent`` attribute is set by the function ``set_parents()``. It permits one to navigate not only down a tree, but also back up again. **Cat.** The ``cat`` attribute represents the syntactic category of the tree. The category may be anything, though strings and ``Category`` instances are the commonest choices. **Role.** The links in a dependency tree are often labeled. The link label indicates the relationship between governor and dependent, such as "subject" or "object." The same relationship can be useful in constituent trees as indicating the **role** of a child relative to its parent (or the head of its parent). As already mentioned, the role "``head``" has a special status if the parent is a phrasal node. **ID.** Nodes are sometimes assigned identifiers, such as the indices used to encode movement relations or control. **Sem.** The value for ``sem`` is the semantic translation of the node. Example ------- Here is an example of constructing a tree manually, by constructing individual nodes. The first two arguments to the constructor are the category and a list of children. A word may be specified using the keyword "``word``." >>> from selkie.nlp.tree import Tree >>> det = Tree('Det', word='the') >>> n = Tree('N', word='dog') >>> np = Tree('NP', [det, n]) Here are examples for the three main attributes. >>> np.cat 'NP' >>> np.children [, ] >>> np.word >>> det.word 'the' One can set ``role`` and ``id``: >>> det.role = 'spec' >>> n.role = 'head' >>> np.id = 1 The ``nld`` attribute is only relevant for dependency trees: see below under "Dependents." One can print out the tree rooted at a node using a ``print`` statement: >>> print(np) 0 (NP &1 1 (Det:spec the) 2 (N:head dog)) Notice that nodes are numbered. One can access them directly by number: >>> np[2] This is particularly useful for large trees. Copy ---- The method ``copy()`` makes a shallow copy of a node. If the original node has children, a fresh copy of the child list is made, but the child nodes themselves are not copied. >>> y = np.copy() >>> y is np False >>> y.children is np.children False >>> y.children == np.children True >>> y.children[0] is np.children[0] True One can modify any of the attributes ``cat``, ``children``, ``word``, ``nld``, ``role``, or ``id`` when making the copy. >>> z = np.copy(children=[]) >>> print(z) 0 (NP &1) Node functions -------------- The ``Tree`` class has few methods. Instead, there is a large collection of functions that are intended to work with any object (though not all of them are fully general yet). .. py:function:: getcat(n) Category or None. .. py:function:: getchildren(n) List or ``None``. .. py:function:: getparent(n) Node or ``None``. .. py:function:: getword(n) String or ``None``. .. py:function:: getnld(n) Integer or ``None``. .. py:function:: getrole(n) Role or ``None``. .. py:function:: getid(n) ID or ``None``. .. py:function:: getsem(n) Semantic translation or ``None``. .. py:function:: is_interior(n) Coerce children to boolean. .. py:function:: is_leaf(n) Non-false *n* with boolean false children. .. py:function:: is_governor(n) Non-false children, non-false word. .. py:function:: is_phrase(n) Non-false children, boolean false word. .. py:function:: is_headed_phrase(n) Phrase with head child. .. py:function:: is_unheaded_phrase(n) Phrase with no head child. .. py:function:: is_leaf_word(n) Leaf, non-false word. .. py:function:: is_empty_leaf(n) Leaf, boolean false word. .. py:function:: is_unary(n) Phrase, one child. .. py:function:: nodetype(n) String representing node type. .. py:function:: head_child(n) A node or ``None``. .. py:function:: head_index(n) An integer or ``None``. .. py:function:: child_index(n, c) Integer; *c*'s index in children. .. py:function:: left_dependents(n) List of nodes. .. py:function:: right_dependents(n) List of nodes. .. py:function:: expansion(n) List of nodes or ``None``. .. py:function:: delete_child(n, i) Delete a child. Accessors --------- Instead of using the attributes directly, it is best to use the accessor functions ``getword()``, ``getchildren()``, ``getnld()``, ``getrole()``, ``getcat()``, ``getsem()``, ``getid()``, and ``getparent()``. These functions can be applied to arbitrary objects, not just ``Tree`` instances. If called on something that lacks the attribute in question, they return ``None``. There is one exception: if a string is passed to ``getword()``, it returns the string itself. (Hence a string behaves like a leaf node that has a value for ``word`` but has no category.) Some examples: >>> from selkie.nlp.tree import getcat, getword >>> getcat(np) 'NP' >>> getcat('hi') >>> getword(det) 'the' >>> getword('hi') 'hi' Predicates ---------- **Basic predicates.** The following functions are available to test for properties of a node: ``is_interior()``, ``is_leaf()``, ``is_governor()``, ``is_phrase()``, ``is_headed_phrase()``, ``is_unheaded_phrase()``, ``is_leaf_word()``, ``is_empty_leaf()``, They have all been previously discussed. Some examples: >>> from selkie.nlp.tree import is_interior, is_leaf, is_headed_phrase >>> is_interior('hi') False >>> is_leaf(det) True >>> is_leaf('hi') True >>> is_headed_phrase(np) True **Is empty.** The function ``is_empty()`` tests whether a node is empty or not. This is technically not a property of the node itself, but of the tree rooted at the node: a node is empty just in case neither it nor any of its descendants has a value for ``word``. >>> from selkie.nlp.tree import is_empty >>> is_empty(Tree()) True >>> is_empty(Tree('NP', [Tree('N')])) True >>> is_empty(Tree('NP', [Tree('N', word='dog')])) False **Is unary.** The function ``is_unary()`` returns true just in case the node has exactly one child. >>> from selkie.nlp.tree import is_unary >>> is_unary(np) False >>> is_unary(det) False >>> is_unary(Tree('NP', [Tree('N', word='rice')])) True **Node type.** The function ``nodetype()`` returns one of the following: ``'leaf'``, ``'governor'``, ``'unheaded phrase'``, or ``'headed phrase'``. >>> from selkie.nlp.tree import nodetype >>> nodetype(np) 'headed phrase' >>> nodetype(det) 'leaf' >>> nodetype('hi') 'leaf' Structural access ----------------- **Head child.** The function ``head_child()`` returns the child whose role is "``head``," if one exists. (If there is more than one, it returns only the first.) >>> from selkie.nlp.tree import head_child >>> head_child(np) **Head index.** The function ``head_index()`` returns the head child's index in the ``children`` list. It returns *-1* if there is no head child. Children are numbered from 0. >>> from selkie.nlp.tree import head_index >>> head_index(np) 1 >>> head_index('hi') -1 **Child index.** The function ``child_index()`` takes two arguments, parent and child, and returns the index of the child in the parent's ``children`` list. It returns *-1* if the child is not found. >>> from selkie.nlp.tree import child_index >>> child_index(np, det) 0 >>> child_index(np, 'foo') -1 **Dependents.** If the node has a value for ``nld``, the function ``left_dependents()`` returns all children up to, but not including, ``nld``. The function ``right_dependents()`` returns all remaining children. If the node has no value for ``nld``, but it does have a head child, then ``left_dependents()`` returns all children preceding the head child, and ``right_dependents()`` returns all children following the head child. If the node has neither ``nld`` nor a head child, both functions signal an error. >>> from selkie.nlp.tree import left_dependents, right_dependents >>> left_dependents(np) [] >>> right_dependents(np) [] >>> sbj = Tree('N', word='dogs') >>> v = Tree('V', word='chase') >>> obj = Tree('N', word='cats') >>> v.children = [sbj, obj] >>> v.nld = 1 >>> left_dependents(v) [] >>> right_dependents(v) [] **Expansion.** If a node has children, the function ``expansion()`` returns a tuple consisting of the node's category followed by the categories of its children. Some of the categories may be ``None``. If the node has no children, the return value is ``None``. >>> from selkie.nlp.tree import expansion >>> expansion(np) ('NP', 'Det', 'N') Destructive ----------- The function ``delete_child()`` takes a node and a child index, and deletes the child at that index. The value for ``nld`` is adjusted, if necessary. There is no return value. >>> from selkie.nlp.tree import delete_child >>> delete_child(v,0) >>> left_dependents(v) [] >>> right_dependents(v) [] Trees ----- The functions that treat a ``Tree`` instance as representing a complete tree (rather than just a node) are summarized as follows. .. py:function:: is_headed_tree(t) All interior nodes have heads .. py:function:: is_unheaded_tree(t) All interior nodes lack heads .. py:function:: is_dependency_tree(t) All interior nodes have words. .. py:function:: treetype(n) A string representing the type. .. py:function:: load_trees(fn) Returns a list of trees. .. py:function:: parse_trees(s) Same, but *s* is string. .. py:function:: parse_tree(s) Single tree, else error. .. py:function:: iter_trees(fn) Iteration over trees. .. py:function:: tree_string(t) Pretty-printed. .. py:function:: print_tree(t) Prints tree string. .. py:function:: save_trees(t, fn) Saves tree string to file. .. py:function:: load_tabular_trees(fn) List of trees. .. py:function:: iter_tabular_trees(fn) Iteration over trees. .. py:function:: save_tabular_trees(ts, fn) Write to file. Tree types ---------- The type of a tree is defined by the type of interior nodes it contains. A tree is an unheaded phrase-structure tree if all its interior nodes are unheaded phrasal nodes. A tree is a headed phrase-structure tree if all its interior nodes are headed phrasal nodes. A tree is a dependency tree if all its interior nodes are governors. A hybrid tree is one that satisfies none of these three definitions. All three types of tree contain identical leaf nodes. They differ only in their interior nodes. Technically, a tree containing no interior nodes (i.e., consisting of a single terminal node) satisfies all three definitions. The following functions test tree types: >>> from selkie.nlp.tree import is_headed_tree, is_unheaded_tree, is_dependency_tree >>> is_headed_tree(np) True >>> is_unheaded_tree(np) False >>> is_dependency_tree(v) True The function ``treetype()`` returns the tree type: ``'headed phrase'``, ``'unheaded phrase'``, or ``'governor'`` (the lattermost for a dependency tree). It returns ``'leaf'`` if the tree consists of a single leaf node, and ``None`` if the tree is hybrid. >>> from selkie.nlp.tree import treetype >>> treetype(np) 'headed phrase' >>> treetype(v) 'governor' >>> treetype(det) 'leaf' Load and parse -------------- **Iter trees.** The function ``iter_trees()`` reads trees in a lisp-like format from a file or string. Like all the load/save functions, ``iter_trees()`` takes its argument to name a file if it is a ``Fn``, and to provide the contents, if it is a string. Here is an example: >>> from selkie.data import ex >>> from selkie.nlp.tree import iter_trees >>> ts = iter_trees(ex('tree2')) >>> next(ts) >>> print(_) 0 (S 1 (NP:subj &1 foo 2 (Det the) 3 (N dog)) 4 (VP:head &2 5 (V chased) 6 (NP:dobj 7 (Det the) 8 (N cat)))) **Load and parse.** The function ``load_trees()`` simply returns:: list(iter_trees(fn)) The function ``parse_trees()`` also dispatches to ``iter_trees()``, but it wraps its arguments in a ``Contents`` instance (from ``selkie.nlp.io``) so that the argument is interpreted as a string representing a tree, rather than a filename. The function ``parse_tree()`` returns a single tree instead of a list of trees; it signals an error if its argument does not parse as a single tree. >>> from selkie.nlp.tree import parse_tree >>> foo = parse_tree(''' ... (NP:subj&1 foo ... (Det the) ... (N:head dog)) ... ''') >>> print(foo) 0 (NP:subj &1 foo 1 (Det the) 2 (N:head dog)) Print and save -------------- **Pretty-print string.** The function ``tree_string()`` takes a tree and returns a pretty-printed string. >>> from selkie.nlp.tree import tree_string >>> tree_string(foo) '0 (NP:subj &1\n foo\n1 (Det the)\n2 (N:head dog))' >>> print(_) 0 (NP:subj &1 foo 1 (Det the) 2 (N:head dog)) The ``Tree`` method ``__str__()`` simply dispatches to ``tree_string()``. One can suppress the node numbers by specifying ``numerate=False``. >>> print(tree_string(foo, numerate=False)) (NP:subj &1 foo (Det the) (N:head dog)) The string-valued attributes (``word``, ``cat``, ``role``, ``id``) are formatted without surrounding quotes unless they contain whitespace or one of the following special characters: left or right parenthesis, left or right square bracket, colon, or single or double quote. For example: >>> print(Tree('Multi\x20N', id='a/b', word='hors d\'oeuvre')) 0 ('Multi N' "hors d'oeuvre" &a/b) **Save trees.** The function ``save_trees()`` takes an iterator over trees and a filename, and prints the trees to the named file. >>> from tempfile import TemporaryDirectory >>> from os.path import join >>> from selkie.nlp.io import load_string >>> from selkie.nlp.tree import save_trees >>> with TemporaryDirectory() as dirname: ... fn = join(dirname, 'foo') ... save_trees([foo], fn) ... print(load_string(fn), end='') ... (NP:subj &1 foo (Det the) (N:head dog)) As with all of the "save" functions, the filename is option; if omitted, it collects and returns its output as a string. >>> s = save_trees([foo]) >>> print(s, end='') (NP:subj &1 foo (Det the) (N:head dog)) Tabular tree files ------------------ There is also a tabular format for representing trees in a file. An example is provided by the file ``t1``, whose contents are:: >>> from selkie.nlp.io import contents >>> print(contents(ex('t1')), end='') # doctest: +NORMALIZE_WHITESPACE [ S [ NP + Det the + N cat ] [ VP + V chased [ NP + Det the + N dog ] ] ] A record may contain up to six fields: .. list-table:: :widths: 1 2 6 * - 1 - Record type - Left bracket for the beginning of a nonterminal node, right bracket for the end of a nonterminal node, and plus for a terminal node. * - 2 - Category - Syntactic category. * - 3 - Word - It may not contain a tab or newline, but any other character (including space) is allowed. * - 4 - Role - A symbol representing the relation between the node and its parent or governor. * - 5 - Head - A numeric index, identifying either a particular child, or a position among the children. * - 6 - ID - A numeric index for the node. None of the fields is obligatory. Additional fields beyond these six are also permitted, but ignored. The function ``iter_tabular_trees()`` can be used to read a file in tabular tree format. It is a generator over trees: >>> from selkie.nlp.tree import iter_tabular_trees >>> t1 = next(iter_tabular_trees(ex('t1'))) >>> print(t1) 0 (S 1 (NP 2 (Det the) 3 (N cat)) 4 (VP 5 (V chased) 6 (NP 7 (Det the) 8 (N dog)))) The function ``load_tabular_trees()`` converts the generator into a list. Conversely, the function ``save_tabular_trees()`` takes a tree iterator and a filename, and saves the trees in tabular format. >>> from selkie.nlp.tree import save_tabular_trees >>> with TemporaryDirectory() as dirname: ... fn = join(dirname, 'foo') ... save_tabular_trees([foo], fn) ... for line in open(fn): print(repr(line)) ... '[\tNP\tfoo\tsubj\t0\t1\n' '+\tDet\tthe\t\t\t\n' '+\tN\tdog\thead\t\t\n' ']\n' Drawing ------- The function ``draw_tree()`` draws a tree. It requires the package "graphviz" to be installed. It takes a second argument, which is the filename to write. If the filename is omitted, a temp file is written and displayed to the screen. Tree iterations --------------- If one iterates directly over a tree, one iterates over its nodes in preorder. The iteration functions and related functions are summarized as follows. .. py:function:: preorder(n) An iteration over nodes. .. py:function:: textorder(n) An iteration over nodes. .. py:function:: iter_nodes(t) A preorder walk. .. py:function:: nodes(t) A list, in preorder. .. py:function:: iter_edges(t) Parents in preorder, children in order. .. py:function:: edges(t) A list. .. py:function:: subtrees(t, f) Highest nodes that satisfy *f* (list). .. py:function:: iter_subtrees(t, f) Iteration. .. py:function:: subtree(t, f) Error if not unique. .. py:function:: paths(t) Paths in tree as slash-joined cats. .. py:function:: leaves(t) All leaves. .. py:function:: words(t) Non-empty leaves. .. py:function:: tagged_words(t) List of (word, cat) pairs. .. py:function:: terminal_string(t) Space-joined string. .. py:function:: is_empty(t) Yields empty string. .. py:function:: is_efree_tree(t) All nodes have children or word. .. py:function:: is_unaryfree_tree(t) No unary expansions. .. py:function:: copy_tree(t) Deep copy. .. py:function:: delete_nodes(t, cs) Delete nodes with cat in *cs*. .. py:function:: eliminate_epsilons(n) Elminate empty nodes, destructive. .. py:function:: set_parents(t) Detructive. .. py:function:: getroot(n) Root above node. Preorder and text order walks ----------------------------- A walk is an iteration over the nodes of a tree. Two different walks are defined: ``preorder()`` and ``textorder()``. In a preorder walk, one visits a node before any of its children. For phrase-structure trees, text order is identical to preorder, but for dependency trees, they differ. More precisely, in a text-order walk, any node that has a value for ``nld`` is visited after visiting its left dependents, but before visiting its right dependents. To illustrate, we create two trees. The first is a headed phrasal tree: >>> from selkie.nlp.tree import parse_tree >>> h = parse_tree('''(S (NP (Det the) (N:head dog)) ... (VP:head (V:head barked)) ... (Adv loudly))''') ... The second is a dependency tree: >>> d = parse_tree('(V (N (Det the) dog) barked (Adv loudly))') We can confirm that the preorder and text order walks for the phrasal tree are the same: >>> from selkie.nlp.tree import preorder, textorder >>> for node in preorder(h): print(repr(node)) ... >>> for node in textorder(h): print(repr(node)) ... But they differ for the dependency tree: >>> for node in preorder(d): print(repr(node)) ... >>> for node in textorder(d): print(repr(node)) ... Nodes and edges --------------- The ``__iter__()`` method of a tree, and the function ``iter_nodes()``, are both synonyms for ``preorder()``. The function ``nodes()`` turns the generator into a list. An *edge* is a pair *(p,c)* where *p* is a parent node and *c* is one of its children. The function ``iter_edges()`` returns an iteration over the edges in a tree. The function ``edges()`` turns the iteration into a list. Subtrees -------- **Subtrees, iter subtrees.** The function ``iter_subtrees()`` returns an iteration over subtrees that satisfy a given predicate. The function ``subtrees()`` turns the iteration into a list. The difference between these functions and simply filtering the output of ``nodes()`` is that ``iter_subtrees()`` terminates the recursion whenever it finds a node matching the predicate. >>> from selkie.nlp.tree import subtrees >>> subtrees(h, lambda x: is_leaf(head_child(x))) [, ] If the second argument is a string, it is taken to be the desired node category. >>> subtrees(h, 'Adv') [] **Subtree.** The function ``subtree()`` takes the list produced by ``subtrees()`` and returns its member, if there is exactly one. It signals an error if the list is not a singleton list. Paths and leaves ---------------- **Paths.** The function ``paths()`` returns the list of paths through the tree. A path is represented by a string in which node categories are separted by "/." The categories in the path are ordered from root to leaf. >>> from selkie.nlp.tree import paths >>> paths(h) ['S/NP/Det', 'S/NP/N', 'S/VP/V', 'S/Adv'] **Leaves.** The function ``leaves()`` returns the list of leaf nodes in a tree. The leaves are listed in preorder. **Words.** The function ``words()`` differs from ``leaves()`` in two ways: it only includes leaves that have a value for ``word``, and it uses a text-order walk. >>> from selkie.nlp.tree import words >>> words(h) ['the', 'dog', 'barked', 'loudly'] **Tagged words.** The function ``tagged_words()`` is like ``words()``, except that it produces a list of pairs of form (*word, cat*). >>> from selkie.nlp.tree import tagged_words >>> tagged_words(h) [('the', 'Det'), ('dog', 'N'), ('barked', 'V'), ('loudly', 'Adv')] **Terminal string.** The function ``terminal_string()`` takes the output of ``words()`` and turns it into a string. The words are separated by spaces. >>> from selkie.nlp.tree import terminal_string >>> terminal_string(h) 'the dog barked loudly' Predicates ---------- **Is e-free.** The function ``is_efree_tree()`` returns true just in case the tree contains no empty nodes. **Is unary-free.** The function ``is_unaryfree_tree()`` returns true just in case there are no unary-branching nodes in the tree. Copy tree --------- The function ``copy_tree()`` does a deep copy of a tree. Unlike the node method ``copy()``, ``copy_tree()`` does recurse through the whole tree, making copies of all nodes. Transformations --------------- The operations described in this section, as well as the transformations described in the chapters on head-marking and stemma conversion, are destructive. To protect a tree, make a copy before applying destructive operations. **Delete nodes.** The function ``delete_nodes()`` deletes all nodes with a given category. (However, it never deletes the root node.) **Eliminate epsilons.** The function ``eliminate_epsilons()`` eliminates all empty nodes from a tree. If the tree was initially headed, any heads that are empty will get deleted. >>> from selkie.nlp.tree import eliminate_epsilons >>> e = parse_tree(''' ... (S ... (NP (N )) ... (VP ... (VBZ ) ... (RB surely) ... (NP Fido))) ... ''') >>> eliminate_epsilons(e) >>> print(e) 0 (S 1 (VP 2 (RB surely) 3 (NP Fido))) **Set parents, get root.** The function ``set_parents()`` destructively adds a ``parent`` attribute to every node in the tree, pointing back to the node's parent. After parents have been set in a tree, one can use the function ``getroot()`` to go from any node to the root node. It follows parent links up the tree to the root node. Tree builder ------------ .. py:class:: selkie.nlp.tree.TreeBuilder A stack-like data structure for constructing a tree. .. py:method:: start(c, w, [r, i]) Start a new interior node. Cat, word, role, id. .. py:method:: middle() This is the head position. .. py:method:: end() Pop node. .. py:method:: leaf(c, w, [r, i]) Create a leaf node. .. py:method:: trees() A list; error if not finished. .. py:method:: tree() A tree; error if not finished and unique. Here is an example of use: >>> from selkie.nlp.tree import TreeBuilder >>> tb = TreeBuilder() >>> tb.start('NP') >>> tb = TreeBuilder() >>> tb.start('S') >>> tb.start('NP', role='subj') >>> tb.leaf('Det', 'the') >>> tb.leaf('N', 'dog') >>> tb.end() >>> tb.start('VP', role='head') >>> tb.leaf('V', 'barks', role='head') >>> tb.end() >>> tb.end() >>> tb.tree() >>> print(_) 0 (S 1 (NP:subj 2 (Det the) 3 (N dog)) 4 (VP:head 5 (V:head barks))) The methods for building a phrasal node are ``start()`` and ``end()``. Both return the node. To build a dependency node, one also uses the method ``middle()`` to mark the position at which the governor occurs. For example: >>> tb.start('V', word='chase') >>> tb.leaf('N', 'dogs') >>> tb.middle() >>> tb.leaf('N', 'cats') >>> tb.end() >>> tb.tree() >>> print(_) 0 (V 1 (N dogs) chase 2 (N cats)) The builder allows one to construct multiple trees; it saves them on a list until one calls either ``tree()`` or ``trees()``. The latter returns the list of trees constructed. The former returns a single tree, and signals an error if there is not exactly one tree on the list. Both methods signal an error if there is an incomplete tree in progress. Both methods restore the builder to its empty state.