Natural Language Processing & Interpretation

TIN173/DIT410 Artificial Intelligence
John J. Camilleri
2017-04-04

What is natural language?

Natural language

Formal language

Natural Language Processing (NLP)

Examples of NLP tasks

Rules vs Statistics

Phrase-Structure Grammars

Context-free grammars

A context-free grammar is a 4-tuple G = (V, Σ, R, S) where:

The language L of a grammar G is the set of strings which have a valid derivation from S:

L(G) = { w ∈ Σ* : S ⇒* w}

Example grammar

S = sentence, NP = noun phrase, VP = verb phrase, Det = determiner, N = noun, V = verb

Rules

S  → NP VP
NP → Det N
VP → V NP

Lexicon

N   → man | mountain
V   → saw
Det → a | the

"the man saw a mountain"

the man saw a mountain

"the man saw a"

the man saw a
the man saw a ? No complete tree starting wih S!

Bigger example

We add prepositions Prep ("with", "on") and prepositional phrases PP.

Rules

S  → NP VP
NP → Det N
NP → NP PP
VP → V NP
VP → VP PP
PP → Prep NP

Lexicon

NP   → Mary
N    → man | mountain | telescope
V    → saw
Det  → a | the
Prep → with | on

"Mary saw the man on the mountain with a telescope"

"saw ... with a telescope"

"... on the mountain with a telescope"

"the man ... with a telescope"

Syntactic analysis (parsing)

Algorithms

Parsing demos in Python

CYK algorithm

Mary saw the man with a telescope
[0,1] NP [0,2] - [0,3] - [0,4] S [0,5] - [0,6] - [0,7] S¹, S²
[1,2] V [1,3] - [1,4] VP [1,5] - [1,6] - [1,7] VP¹, VP²
[2,3] Det [2,4] NP [2,5] - [2,6] - [2,7] NP
[3,4] N [3,5] - [3,6] - [3,7] -
[4,5] Prep [4,6] - [4,7] PP
[5,6] Det [5,7] NP
[6,7] N

Note: at [1,7] there are two ways of building a VP, introducing ambiguity.

S¹:
(S (NP Mary) (VP (V saw) (NP (NP (Det the) (N man)) (PP (Prep with) (NP (Det a) (N telescope))))))

S²:
(S (NP Mary) (VP (VP (V saw) (NP (Det the) (N man))) (PP (Prep with) (NP (Det a) (N telescope)))))

Probabilitistic parsing

PCFG example

Rules

S  → NP VP    [1.0]
NP → Det N    [0.5]
NP → NP PP    [0.4]
VP → V NP     [0.8]
VP → VP PP    [0.2]
PP → Prep NP  [1.0]

Lexicon

NP   → Mary [0.1]
N    → man  [0.6] | mountain [0.2] | telescope [0.2]
V    → saw  [1.0]
Det  → a    [0.4] | the [0.6]
Prep → with [0.5] | on  [0.5]

Parse tree probability


1.0 × 0.1 × 0.2 × 0.8 × 1.0 × 0.4 × 0.5 × 0.6 × 0.6 × 1.0 × 0.5 × 0.5 × 0.6 × 0.2 × 1.0 × 0.5 × 0.5 × 0.4 × 0.2 = 0.0000006912 = 6.912 e-7


1.0 × 0.1 × 0.8 × 1.0 × 0.4 × 0.5 × 0.6 × 0.6 × 1.0 × 0.5 × 0.4 × 0.5 × 0.6 × 0.2 × 1.0 × 0.5 × 0.5 × 0.4 × 0.2 = 0.0000013824 = 1.3824 e-6

Overgeneration

Let's add the pronoun I and present tense verb see:

NP   → Mary | I
V    → saw  | see

I see Mary

Mary saw I

Mary see a telescope

Solution 1: adding more rules

1.1: Handling pronoun case

Separate rules for noun-phrases in subject position (NP_S) and noun phrases in object position (NP_O).

Rules

S    → NP_S VP
NP_S → Det N
NP_S → NP_S PP
NP_O → Det N
NP_O → NP_O PP
VP   → V NP_O
VP   → VP PP
PP   → Prep NP_S
PP   → Prep NP_O

Lexicon

NP_S → Mary | I
NP_O → Mary | me
N    → man  | mountain | telescope
V    → saw  | see
Det  → a    | the
Prep → with | on

1.2: Handling verb-subject agreement

Separate rules for noun-phrases in the first-person (NP_?_P1) and noun phrases the third-person (NP_?_P3).

Rules

S       → NP_S_P1 VP_P1
S       → NP_S_P3 VP_P3
NP_S_P1 → Det N
NP_S_P1 → NP_S_P1 PP
NP_O_P1 → Det N
NP_O_P1 → NP_O_P1 PP
NP_S_P3 → Det N
NP_S_P3 → NP_S_P3 PP
NP_O_P3 → Det N
NP_O_P3 → NP_O_P3 PP
VP_P1   → V_P1 NP_O_P1
VP_P1   → V_P1 NP_O_P3
VP_P1   → VP_P1 PP
VP_P3   → V_P3 NP_O_P1
VP_P3   → V_P3 NP_O_P3
VP_P3   → VP_P3 PP
PP      → Prep NP_S_P1
PP      → Prep NP_O_P1
PP      → Prep NP_S_P3
PP      → Prep NP_O_P3

Lexicon

NP_S_P1 → I
NP_O_P1 → me
NP_S_P3 → Mary
NP_O_P3 → Mary
N       → man  | mountain | telescope
V_P1    → saw  | see
V_P3    → saw  | sees
Det     → a    | the
Prep    → with | on

Solution 2: augmenting the grammar

Rules

S        → NP(n,c) VP(n) {c = Sbj}
NP(P3,_) → Det N
NP(n,c)  → NP(n,c) PP
VP(n)    → V(n) NP(_,c) {c = Obj}
VP(n)    → VP(n) PP
PP       → Prep NP(_,_)

Lexicon

NP(P3,_)   → Mary
NP(P1,Sbj) → I
NP(P1,Obj) → me
N     → man | mountain | telescope
V(P1) → saw | see
V(P3) → saw | sees
Det   → a | the
Prep  → with | on

Generative capacity (Chomsky hierarchy)

Grammar Languages Automaton Production rules (constraints)
Type-0 Recursively enumerable Turing machine α → β (no restrictions)
Type-1 Context-sensitive Linear-bounded non-deterministic Turing machine α A β → α γ β
Type-2 Context-free Non-deterministic pushdown automaton A → γ
Type-3 Regular Finite state automaton A → a and A → aB

where

More on ambiguity

Most of the sentences we hear seem unambiguous.
But almost every utterance contains some kinds of ambiguity.
We are just very good at disambiguating!

Levels of ambiguity

Disambiguation

Probabilites as in PCFG only help us choose most likely phrase (most common in a corpus). But that is not taking context into account!

We need models to disambiguate.

Acoustic model

Language model

Mental model

Politician saying: I am not a crook

World model