Grammars — selkie.nlp.grammar
Grammars are currently context-free grammars (rewrite rules)
with features and semantic translations. (A more abstract
representation is in the planning stage.) Here is a simple
example of the format. This is the contents of ex('g9.g').
In the section headers (e.g., “% Features”), the space
following the percent sign is optional, and the capitalization of
the section name does not matter:
% Features
nform = sg/pl
vform = nform/ing
trans = i/t
bool = +/- default -
% Categories
S []
NP [form:nform, wh:bool]
VP [form:vform]
V [form:vform, trans:trans]
N [form:nform]
Det [form:nform]
% Rules
S -> NP[_f] VP[_f]
NP[_f] -> Det[_f] N[_f]
VP[_f] -> V[_f,i]
VP[_f] -> V[_f,t] NP
% Lexicon
the Det
a Det[sg]
cat N[sg]
dog N[sg]
dogs N[pl]
barks V[sg,i]
chases V[sg,t]
To load it:
>>> from selkie.data import ex
>>> from selkie.nlp.grammar import Grammar
>>> g = Grammar(ex('g9'))
>>> print(g)
Start: S
Rules:
[0] S -> NP[_f,-] VP[_f]
[1] NP[_f,-] -> Det[_f] N[_f]
[2] VP[_f] -> V[_f,i]
[3] VP[_f] -> V[_f,t] NP[pl/sg,-]
Lexicon:
a Det[sg]
barks V[sg,i]
cat N[sg]
chases V[sg,t]
dog N[sg]
dogs N[pl]
the Det[pl/sg]
Documentation of Classes
- class selkie.nlp.grammar.Lexicon.Entry
A lexical entry. It consists of a word, a part of speech, and an optional semantic translation.
>>> from selkie.nlp.features import C >>> from selkie.nlp.grammar import Lexicon >>> ent = Lexicon.Entry('dog', C('n'), 'DOG') >>> ent.word 'dog' >>> ent.pos n >>> ent.sem 'DOG'
- class selkie.nlp.grammar.Lexicon
A Lexicon consists of a set of lexical entries.
- define(word, pos[, sem])
The basic method. It takes a word, a part of speech (category), and an optional semantic value.
>>> lex = Lexicon() >>> lex.define('cat', C(['n','sg'])) >>> print(lex) cat n[sg]
- __getitem__(word)
The lexicon can be accessed by word. The value is a list of entries.
>>> lex['cat'] [<Entry cat n[sg]>]
An error is signalled if the word is not present.
- __len__()
The length of the lexicon is the number of entries.
>>> len(lex) 1
- __iter__()
For purposes of iteration, the elements of a lexicon are entries.
>>> list(lex) [<Entry cat n[sg]>]
- class selkie.nlp.grammar.Rule(lhs, rhs, sem, symtab)
Grammar rules are represented by instances of the class Rule. A Rule has five attributes:
lhs,rhs,bindings,variables, andsem. Thelhsis a single category, and therhsis a list of categories. The value forbindingsis a list containing*’s, one for each variable used in the rule. The value forvariablesis a list of string representations for the variables, orNone. The value forsemis an expression.The constructor takes a lhs, rhs, sem, and a symbol table. The symbol table is a dict that maps variable names to integers from 0 to the size of the table. The symbol table is optional; if omitted, variables are anonymous. The length of the bindings list is the size of the symbol table, if provided. Otherwise, it is one greater than the largest numeric variable occurring in either the lhs or rhs.
>>> from selkie.nlp.grammar import Rule >>> r = Rule('vp', ['v', 'np'], 'foo') >>> r.lhs 'vp' >>> r.rhs ['v', 'np'] >>> r.bindings [] >>> r.sem 'foo'
- class selkie.nlp.grammar.Grammar
The Grammar class has a similar structure to the Lexicon class. Internally, it maintains two indices. A rule of form X -> Y1 … Yn is indexed by X in the lefthand side index, and it is indexed by Y1 in the righthand side index.
- define(lhs, rhs[, sem, symtab])
The basic method. It takes a lhs, rhs, an optional semantic translation, and an optional symbol table.
>>> from selkie.nlp.grammar import Grammar >>> g = Grammar() >>> g.define(C('s'), [C('np'), C('vp')]) >>> g.define(C('vp'), [C('v'), C('np')]) >>> print(g) Start: s Rules: [0] s -> np vp [1] vp -> v np
- start
The attribute
startcontains the start category. It defaults to the lhs of the first rule defined.>>> g.start s
- expansions(cat)
Takes a string X and returns the list of rules of form X -> Y1 … Yn. Note that the input is just a string, not a full category.
>>> g.expansions('vp') [<vp -> v np>]
- continuations(cat)
Returns the list of rules whose righthand side begins with a given symbol. For example:
>>> g.continuations('v') [<vp -> v np>]
- declarations
The value of
declarationsis generallyNone, unless the grammar is created by the grammar loader from a file that contains declarations.
- lexicon
The lexicon.
- class selkie.nlp.grammar.GrammarLoader(fn)
The
GrammarLoaderreads a grammar file.- load(fn, dir=None)
Load a file. If dir is provided and fn is not an absolute pathname, dir is prefixed to fn. If fn ends in .g, then
load_generic()is called on it. Otherwise,load_generic()is called on fn + .g andload_lex()is called on fn + .lex.
- load_generic(fn)
The file is opened and converted to a token stream using
lines_to_tokens(). Until the end of file, the next two tokens are consumed. The first must be a percent sign, and the second must be a word. (Otherwise, an error is signalled.) The word (lowercased) is passed tohandle_section(), along with the rest of the token stream.
- handle_section(what, tokens)
This method processes one section. The argument what indicates the type of section. If the section type is not known, handle_section() should immediately return False. If the section is successfully processed, it should return True and leave the token stream pointing at the percent sign that begins the next section (or EOF).
This method may be overridden by specializations of GrammarLoader. For example:
class FooLoader (GrammarLoader): def handle_section (self, what, tokens): if what == 'foo': self.scan_foo(tokens) return True else: return GrammarLoader.handle_section(self, what, tokens)