Pseudo-projective parsing ************************* Pseudo-projective parsing involves a transformation applied to a set of CoNLL sentences, and an inverse transformation applied to the output of the parser. The transformation converts the stemma to a projective stemma. Nonprojectivity arises exactly when we have two crossing arcs, instead of proper nesting. Toplevel -------- Stats ..... The function print_stats() runs the projectivizer and reverter on a list of sentences, and reports the results. For example::: >>> from seal.nlp.dp.nnproj import print_stats >>> from seal.data import dep >>> print_stats(dep.sents('dan.orig', 'test')) Projective: 280 / 322 (86.956522%) Not projective: 42 / 322 (13.043478%) Not projective: Revertible: 39 / 42 (92.857143%) Not revertible: 3 / 42 (7.142857%) Revertible: 1 lifts: 33 2 lifts: 3 3 lifts: 1 4 lifts: 2 Not revertible: 1 lifts: 1 2 lifts: 2 Nivre & Nilsson's algorithm ........................... An arc (*g,d*) is defined to be nonprojective just in case it crosses another arc *(g',d')* and *g'* dominates *g*. Find the shortest nonprojective arc *(g,d)*, breaking ties in favor of leftmost arcs. Lift (*g,d*) by replacing it with (*h,d*), where *h* is the governor of *g*. Continue until there are no nonprojective arcs. Functions ......... Let us use the following sentence as a running example:: >>> from seal.core.config import ex >>> from seal.nlp.dep import conll_sents >>> s = next(conll_sents(ex.depsent2)) >>> print(s) 0 *root* _ _ _ _ 1 a pos a/pos A 2 2 b pos b/pos B 4 3 c pos c/pos C 2 4 d pos d/pos D 0 5 e pos e/pos E 7 6 f pos f/pos F 3 7 g pos g/pos G 0 8 h pos h/pos H 7 >>> govrs = s.column('govr') The following functions provided by seal.depparse apply to governor lists. **Dominates** determines whether a given word dominates another. Domination is reflexive and transitive:: >>> from seal.nlp.dp.nnproj import dominates >>> dominates(4, 1, govrs) True >>> dominates(4, 5, govrs) False **Is nonproj** determines whether an arc is nonprojective or not. An arc (*g,d*) is defined to be nonprojective just in case any word between *g* and *d* (exclusive) has a governor that is outside the range (*g,d*):: >>> from seal.nlp.dp.nnproj import is_nonproj >>> is_nonproj((3,6), govrs) True **Has nonproj arcs** returns True if there are any nonprojective arcs in the sentence:: >>> from seal.nlp.dp.nnproj import has_nonproj_arcs >>> has_nonproj_arcs(govrs) True **Nonproj arcs** returns an iterator over the nonprojective arcs in the sentence:: >>> from seal.nlp.dp.nnproj import nonproj_arcs >>> list(nonproj_arcs(govrs)) [(7, 5), (3, 6)] **Next nonproj arc** returns the nonprojective arc with the smallest span. It breaks ties in favor of the leftmost arc:: >>> from seal.nlp.dp.nnproj import next_nonproj_arc >>> next_nonproj_arc(govrs) (7, 5) Projectivizer ------------- Projectivizer functions ....................... **Projectivize** takes either a sentence or iterator over sentences, and returns the same type of object:: >>> from seal.nlp.dp.nnproj import projectivize >>> ps = projectivize(s) >>> print(ps) 0 *root* _ _ _ _ 1 a pos a/pos A 2 2 b pos b/pos B 4 3 c pos c/pos C 2 4 d pos d/pos D 0 5 e pos e/pos G|E 0 6 f pos f/pos C|F 0 7 g pos g/pos G 0 8 h pos h/pos H 7 The return value is a copy; the original sentence is not modified. The projectivizer only modifies non-projective arcs, so if the original sentence is already projective, the new sentence is identical to the old. **Revert** takes a projectivized sentence, or iterator over sentences, and attempts to reconstruct the original:: >>> from seal.nlp.dp.nnproj import revert >>> rs = revert(ps) >>> rs == s True **Stats** takes a sentence or an iterator over sentences. For a single sentence, it projectives and then attempts to revert the sentence. It then returns a pair (*rev, nlifts*) where *rev* is either 'revertible' or 'not-revertible' and *nlifts* is the number of lifts performed during projectivization. (Zero lifts means that the original was already projective.) For example::: >>> from seal.nlp.dp.nnproj import stats >>> stats(s) ('revertible', 4) For a list of sentences, stats() returns a table mapping the stats produced for single sentences to the list of indices of sentences that have those stats. (Note that it uses *sent*.index(), not the actual position of the sentence in the input list.):: >>> sents = dep.sents('dan.orig', 'test') >>> tab = stats(sents) >>> for (k,v) in sorted(tab.items()): ... print(k, len(v)) ... ('not-revertible', 1) 1 ('not-revertible', 2) 2 ('revertible', 0) 280 ('revertible', 1) 33 ('revertible', 2) 3 ('revertible', 3) 1 ('revertible', 4) 2 >>> tab['not-revertible', 2] [131, 198] Projectivizer implementation ............................ A Projectivizer implements the Nivre & Nilsson algorithm:: >>> from seal.nlp.dp.nnproj import Projectivizer >>> p = Projectivizer() It implements the following methods. **Set sent** sets p.orig to a given sentence. It initializes p.govrs and p.roles to be copies of the corresponding columns of the sentence. It initializes p.lifted to be a list containing False for each word in the sentence. And it initializes p.nlifts to 0:: >>> p.set_sent(s) >>> print(p) (2, 1) (4, 2) (2, 3) (0, 4) (7, 5) (3, 6) (0, 7) (7, 8) Note that printing a projectivizer lists the arcs represented by its govrs. **Lift** takes an arc (*g,d*) as input. It changes the governor of *d* to be the governor of *g*:: >>> p.lift((7,5)) >>> print(p) (2, 1) (4, 2) (2, 3) (0, 4) (0, 5) (3, 6) (0, 7) (7, 8) The governor of 7 is 0, so the arc (7,5) has been replaced with (0,5). **Run** repeatedly chooses the next arc and lifts it, until there are no more nonprojective arcs. It returns the resulting list of governors:: >>> p.run() >>> print(p) (2, 1) (4, 2) (2, 3) (0, 4) (0, 5) (0, 6) (0, 7) (7, 8) **Sentence** returns an updated CoNLL sentence:: >>> print(p.sentence()) 0 *root* _ _ _ _ 1 a pos a/pos A 2 2 b pos b/pos B 4 3 c pos c/pos C 2 4 d pos d/pos D 0 5 e pos e/pos G|E 0 6 f pos f/pos C|F 0 7 g pos g/pos G 0 8 h pos h/pos H 7 **Calling** a projectivizer as a function calls set_sent() and run(), and then calls sentence() to generate a projectivized sentence. However, if the input sentence already contains p-governors, it immediately returns the input sentence. To be precise, it returns a triple (*s,p,n*) where *s* is the projectivized sentence, *p* is True if the projectivized sentence is in fact the original sentence, and *n* is the number of lifts performed, or None if the output is the original sentence. Reverter -------- The function set_sent() initializes the reverter with a new sentence:: >>> from seal.nlp.dp.nnproj import Reverter >>> r = Reverter() >>> r.set_sent(ps) >>> print(r) 0 None None 1 2 A 2 4 B 3 2 C 4 0 D 5 0 G|E 6 0 C|F 7 0 G 8 7 H The method find_govr() is given arguments *root* and *role*, and does a breadth-first search starting at *root* to find a word whose role is *role.*:: >>> r.find_govr(0, 'G') 7 The method lower() is given a word *d* as input. It calls find_govr() to find a new governor for *d*, and reattaches *d* to the new governor:: >>> r.lower(5) >>> print(r) 0 None None 1 2 A 2 4 B 3 2 C 4 0 D 5 7 E 6 0 C|F 7 0 G 8 7 H The method run() goes through the sentence from left to right. It calls lower() on each word whose role contains a vertical bar:: >>> r.run() >>> print(r) 0 None None 1 2 A 2 4 B 3 2 C 4 0 D 5 7 E 6 3 F 7 0 G 8 7 H The method sentence() returns a sentence whose govrs and roles are taken from the current state of the reverter. Pgovrs and proles are empty:: >>> rs = r.sentence() >>> rs == s True Calling the reverter as a function does set_sent() and run(), and returns sentence().