Grammars — selkie.nlp.grammar

Grammars are currently context-free grammars (rewrite rules) with features and semantic translations. (A more abstract representation is in the planning stage.) Here is a simple example of the format. This is the contents of ex('g9.g'). In the section headers (e.g., “% Features”), the space following the percent sign is optional, and the capitalization of the section name does not matter:

% Features
nform = sg/pl
vform = nform/ing
trans = i/t
bool = +/- default -
% Categories
S   []
NP  [form:nform, wh:bool]
VP  [form:vform]
V   [form:vform, trans:trans]
N   [form:nform]
Det [form:nform]
% Rules
S -> NP[_f] VP[_f]
NP[_f] -> Det[_f] N[_f]
VP[_f] -> V[_f,i]
VP[_f] -> V[_f,t] NP
% Lexicon
the Det
a Det[sg]
cat N[sg]
dog N[sg]
dogs N[pl]
barks V[sg,i]
chases V[sg,t]

To load it:

>>> from selkie.data import ex
>>> from selkie.nlp.grammar import Grammar
>>> g = Grammar(ex('g9'))
>>> print(g)
Start: S

Rules:
    [0] S -> NP[_f,-] VP[_f]
    [1] NP[_f,-] -> Det[_f] N[_f]
    [2] VP[_f] -> V[_f,i]
    [3] VP[_f] -> V[_f,t] NP[pl/sg,-]

Lexicon:
    a Det[sg]
    barks V[sg,i]
    cat N[sg]
    chases V[sg,t]
    dog N[sg]
    dogs N[pl]
    the Det[pl/sg]

Documentation of Classes

class selkie.nlp.grammar.Lexicon.Entry

A lexical entry. It consists of a word, a part of speech, and an optional semantic translation.

>>> from selkie.nlp.features import C
>>> from selkie.nlp.grammar import Lexicon
>>> ent = Lexicon.Entry('dog', C('n'), 'DOG')
>>> ent.word
'dog'
>>> ent.pos
n
>>> ent.sem
'DOG'
class selkie.nlp.grammar.Lexicon

A Lexicon consists of a set of lexical entries.

define(word, pos[, sem])

The basic method. It takes a word, a part of speech (category), and an optional semantic value.

>>> lex = Lexicon()
>>> lex.define('cat', C(['n','sg']))
>>> print(lex)
cat n[sg]
__getitem__(word)

The lexicon can be accessed by word. The value is a list of entries.

>>> lex['cat']
[<Entry cat n[sg]>]

An error is signalled if the word is not present.

__len__()

The length of the lexicon is the number of entries.

>>> len(lex)
1
__iter__()

For purposes of iteration, the elements of a lexicon are entries.

>>> list(lex)
[<Entry cat n[sg]>]
class selkie.nlp.grammar.Rule(lhs, rhs, sem, symtab)

Grammar rules are represented by instances of the class Rule. A Rule has five attributes: lhs, rhs, bindings, variables, and sem. The lhs is a single category, and the rhs is a list of categories. The value for bindings is a list containing *’s, one for each variable used in the rule. The value for variables is a list of string representations for the variables, or None. The value for sem is an expression.

The constructor takes a lhs, rhs, sem, and a symbol table. The symbol table is a dict that maps variable names to integers from 0 to the size of the table. The symbol table is optional; if omitted, variables are anonymous. The length of the bindings list is the size of the symbol table, if provided. Otherwise, it is one greater than the largest numeric variable occurring in either the lhs or rhs.

>>> from selkie.nlp.grammar import Rule
>>> r = Rule('vp', ['v', 'np'], 'foo')
>>> r.lhs
'vp'
>>> r.rhs
['v', 'np']
>>> r.bindings
[]
>>> r.sem
'foo'
class selkie.nlp.grammar.Grammar

The Grammar class has a similar structure to the Lexicon class. Internally, it maintains two indices. A rule of form X -> Y1 … Yn is indexed by X in the lefthand side index, and it is indexed by Y1 in the righthand side index.

define(lhs, rhs[, sem, symtab])

The basic method. It takes a lhs, rhs, an optional semantic translation, and an optional symbol table.

>>> from selkie.nlp.grammar import Grammar
>>> g = Grammar()
>>> g.define(C('s'), [C('np'), C('vp')])
>>> g.define(C('vp'), [C('v'), C('np')])
>>> print(g)
Start: s

Rules:
    [0] s -> np vp
    [1] vp -> v np
start

The attribute start contains the start category. It defaults to the lhs of the first rule defined.

>>> g.start
s
expansions(cat)

Takes a string X and returns the list of rules of form X -> Y1 … Yn. Note that the input is just a string, not a full category.

>>> g.expansions('vp')
[<vp -> v np>]
continuations(cat)

Returns the list of rules whose righthand side begins with a given symbol. For example:

>>> g.continuations('v')
[<vp -> v np>]
declarations

The value of declarations is generally None, unless the grammar is created by the grammar loader from a file that contains declarations.

lexicon

The lexicon.

class selkie.nlp.grammar.GrammarLoader(fn)

The GrammarLoader reads a grammar file.

load(fn, dir=None)

Load a file. If dir is provided and fn is not an absolute pathname, dir is prefixed to fn. If fn ends in .g, then load_generic() is called on it. Otherwise, load_generic() is called on fn + .g and load_lex() is called on fn + .lex.

load_generic(fn)

The file is opened and converted to a token stream using lines_to_tokens(). Until the end of file, the next two tokens are consumed. The first must be a percent sign, and the second must be a word. (Otherwise, an error is signalled.) The word (lowercased) is passed to handle_section(), along with the rest of the token stream.

handle_section(what, tokens)

This method processes one section. The argument what indicates the type of section. If the section type is not known, handle_section() should immediately return False. If the section is successfully processed, it should return True and leave the token stream pointing at the percent sign that begins the next section (or EOF).

This method may be overridden by specializations of GrammarLoader. For example:

class FooLoader (GrammarLoader):
    def handle_section (self, what, tokens):
        if what == 'foo':
            self.scan_foo(tokens)
            return True
        else:
            return GrammarLoader.handle_section(self, what, tokens)