Evaluation — ``selkie.nlp.dp.eval``
***********************************

The following functions are in the module selkie.nlp.dp.eval::

   >>> from selkie.dp.eval import *
   >>> from selkie import ex
   >>> from selkie.dep import conll_sents

evaluate
--------

This is the main function.  It takes a parser, a list of sentences
with gold pgovrs and proles, and prints out evaluation information.
The parser should place its output in the govr and role slots, not
pgovr and prole.  One may specify ``excludepunc=False`` to count
punctuation tokens.  (They are ignored by default.)  One may provide
``output=`` *stream* to specify
an output stream other than stdout::

   >>> evaluate(parser, sents)

ispunc
------

The function ispunc() returns True if all the characters
in the given string have a Unicode category beginning with "P"::

   >>> ispunc('.')
   True
   >>> ispunc('Dr.')
   False

eval_sent
---------

The function eval_sent() evaluates a single sentence.  Its
arguments are *pred* and *truth*.  It considers the govrs
and roles of the predicted sentence, but the pgovrs and proles of the
true sentence.  (A projective dependency parser can produce
non-projective output if it ever fails to attach a word, so the output
of even a projective dependency parser is stored in the govr/role
slots rather than the pgovr/prole slots.)

The outputs are *las*, *uas*, *la*, *n*, where *las* is the
number of words that have the correct govr and role, *uas* is
the number of words that have the correct govr, *la* is the
number of words that have the correct role, and *n* is the
number of words.  Nota bene: these are counts, not proportions.
Note also that *n* will be less than the length of the
sentence.  The length of the sentence includes the root token
(position 0), which is never included in *n*.
Also, by default, punctuation tokens are ignored.
(One can cause them to be counted by specifying ``excludepunc=False``::

   >>> pred = next(conll_sents(ex.depsent3_pred))
   >>> gold = next(conll_sents(ex.depsent3_gold))
   >>> eval_sent(pred, gold)
   (2, 3, 2, 4)
   >>> eval_sent(pred, gold, excludepunc=False)
   (3, 4, 3, 5)

compare
-------

The function compare() prints out a detailed comparison of a
predicted and a gold sentence::

   >>> compare(pred, gold)
   1   This G R 2 subj 2 subj   
   2   is   G R 0 mv   0 mv     
   3   a        2 pt   4 det    
   4   test G   2 obj  2 prednom
   5 * .        2 obj  2 prednom
   
   LAS: 2 4 0.5 
   UAS: 3 4 0.75
   LA:  2 4 0.5

Punctuation tokens are marked with '\*' in the second column.
Tokens marked 'G' contribute to the UAS score, tokens marked
'R' contribute to the LA score, and tokens marked
'G R' contribute to the LAS score.