Evaluation — selkie.nlp.dp.eval
The following functions are in the module selkie.nlp.dp.eval:
>>> from selkie.dp.eval import *
>>> from selkie import ex
>>> from selkie.dep import conll_sents
evaluate
This is the main function. It takes a parser, a list of sentences
with gold pgovrs and proles, and prints out evaluation information.
The parser should place its output in the govr and role slots, not
pgovr and prole. One may specify excludepunc=False to count
punctuation tokens. (They are ignored by default.) One may provide
output= stream to specify
an output stream other than stdout:
>>> evaluate(parser, sents)
ispunc
The function ispunc() returns True if all the characters in the given string have a Unicode category beginning with “P”:
>>> ispunc('.')
True
>>> ispunc('Dr.')
False
eval_sent
The function eval_sent() evaluates a single sentence. Its arguments are pred and truth. It considers the govrs and roles of the predicted sentence, but the pgovrs and proles of the true sentence. (A projective dependency parser can produce non-projective output if it ever fails to attach a word, so the output of even a projective dependency parser is stored in the govr/role slots rather than the pgovr/prole slots.)
The outputs are las, uas, la, n, where las is the
number of words that have the correct govr and role, uas is
the number of words that have the correct govr, la is the
number of words that have the correct role, and n is the
number of words. Nota bene: these are counts, not proportions.
Note also that n will be less than the length of the
sentence. The length of the sentence includes the root token
(position 0), which is never included in n.
Also, by default, punctuation tokens are ignored.
(One can cause them to be counted by specifying excludepunc=False:
>>> pred = next(conll_sents(ex.depsent3_pred))
>>> gold = next(conll_sents(ex.depsent3_gold))
>>> eval_sent(pred, gold)
(2, 3, 2, 4)
>>> eval_sent(pred, gold, excludepunc=False)
(3, 4, 3, 5)
compare
The function compare() prints out a detailed comparison of a predicted and a gold sentence:
>>> compare(pred, gold)
1 This G R 2 subj 2 subj
2 is G R 0 mv 0 mv
3 a 2 pt 4 det
4 test G 2 obj 2 prednom
5 * . 2 obj 2 prednom
LAS: 2 4 0.5
UAS: 3 4 0.75
LA: 2 4 0.5
Punctuation tokens are marked with ‘*’ in the second column. Tokens marked ‘G’ contribute to the UAS score, tokens marked ‘R’ contribute to the LA score, and tokens marked ‘G R’ contribute to the LAS score.