TOPICS (Click to Navigate)

Pages

Friday, April 3, 2020

Context Free Grammar (CFG) for NLP

Context Free Grammar (CFG) for NLP, formal definition of context-free grammar, CFG examples in NLP



Context Free Grammar (CFG)


Definition:
A context-free grammar consists of a set of rules or productions, each of which expresses the ways that symbols of the language can be grouped and ordered together, and a lexicon of words and symbols.
[Source - Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin]
 
Definition:
A context-free grammar (CFG) is a list of rules that define the set of all well-formed sentences in a language. Each rule has a left-hand side, which identifies a syntactic category, and a right-hand side, which defines its alternative component parts, reading from left to right.

Context Free Grammar (CFG) - Formal Definition

Context-free grammar G is a 4-tuple.
G = (V, T, S, P)
 These parameters are as follows;

  • VSet of variables (also called as Non-terminal symbols)

  • TSet of terminal symbols (lexicon)

    • The symbols that refer to words in a language are called terminal symbols.

    • Lexicon is a set of rules that introduce these symbols.

  • SDesignated start symbol (one of the non-terminals, S V)

  • PSet of productions (also called as rules).

    • Each rule in P is of the form A → s, where

      • A is a non-terminal (variable) symbol.

      • Each rule can have only one non-terminal symbol on the left hand side of the rule.

      • s is a sequence of terminals and non-terminals. It is from (T U V)*, infinite set of strings.

  • A grammar G generates a language L.

Example context-free grammar
G = (V, T, S, P)
   V = {S, NP, VP, PP, Det, Noun, Verb, Aux, Pre}
   T = {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}
   S = S
   P = { S → NP VP
   NP → Det Noun | NP PP
   PP → Pre NP
   VP → Verb NP
   Det → ‘a’ | ‘the’
   Noun → ‘cake’ | ‘child’ | ‘fork’
   Pre → ‘with’
   Verb → ‘ate’}
Some notes:

  • Note 1: In P, pipe symbol (|) is used to combine productions into single representation for productions that have same LHS. For example, Det → ‘a’ | ‘the’ derived from two rules Det → ‘a’ and Det → ‘the’. Yet it denotes two rules not one.

  • Note 2: The production highlighted in red are referred as grammar, and green are referred as lexicon.

  • Note 3: NP – Noun Phrase, VP – Verb Phrase, PP – Prepositional Phrase, Det – Determiner, Aux – Auxiliary verb


Sample derivation:
S → NP VP
   → Det Noun VP
   → the Noun VP
   → the child VP
   → the child Verb NP
   → the child ate NP
   → the child ate Det Noun
   → the child ate a Noun
   → the child ate a cake

From this derivation, we can understand that the sentence ‘the child ate a cake’ is a valid and accepted sentence by the grammar G. in other words, the sentence is part of the language produced by G.

***********




Define context free grammar

How to derive sentences from cfg?

what is lexicon in CFG

What are terminal and non-terminal symbols in CFG

CFG sample solved problem




1 comment: