Context Free Grammar (CFG) for NLP, formal definition of context-free grammar, CFG examples in NLP
Context Free Grammar (CFG)
Definition:
A context-free
grammar consists of a set of rules or productions, each of which expresses
the ways that symbols of the language can be grouped and ordered together, and
a lexicon of words and symbols.
[Source
- Speech and Language Processing: An introduction to natural language
processing, computational linguistics, and speech recognition by Daniel
Jurafsky and James H. Martin]
|
Definition:
A context-free
grammar (CFG) is a list of rules that define the set of all well-formed
sentences in a language. Each rule has a left-hand side, which identifies a
syntactic category, and a right-hand side, which defines its alternative
component parts, reading from left to right.
|
Context Free Grammar (CFG) - Formal Definition
Context-free
grammar G is a 4-tuple.
G
= (V, T, S, P)
These parameters are as follows;
- V – Set of variables (also called as Non-terminal symbols)
- T – Set of terminal symbols (lexicon)
- The symbols that refer to words in a language are called terminal symbols.
- Lexicon is a set of rules that introduce these symbols.
- S – Designated start symbol (one of the non-terminals, S ∈ V)
- P – Set of productions (also called as rules).
- Each rule in P is of the form A → s, where
- A is a non-terminal (variable) symbol.
- Each rule can have only one non-terminal symbol on the left hand side of the rule.
- s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of strings.
- A grammar G generates a language L.
Example context-free
grammar
G
= (V, T, S, P)
V
= {S, NP, VP, PP, Det, Noun, Verb, Aux, Pre}
T
= {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}
S
= S
P
= { S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
Pre → ‘with’
Verb → ‘ate’}
Some notes:
- Note 1: In P, pipe symbol (|) is used to combine productions into single representation for productions that have same LHS. For example, Det → ‘a’ | ‘the’ derived from two rules Det → ‘a’ and Det → ‘the’. Yet it denotes two rules not one.
- Note 2: The production highlighted in red are referred as grammar, and green are referred as lexicon.
- Note 3: NP – Noun Phrase, VP – Verb Phrase, PP – Prepositional Phrase, Det – Determiner, Aux – Auxiliary verb
Sample
derivation:
S →
NP VP
→
Det Noun VP
→
the Noun VP
→
the child VP
→
the child Verb NP
→
the child ate NP
→
the child ate Det Noun
→
the child ate a Noun
→
the child ate a cake
From
this derivation, we can understand that the sentence ‘the child ate a cake’ is
a valid and accepted sentence by the grammar G. in other words, the sentence is
part of the language produced by G.
***********
great
ReplyDelete