Natural Language Interpretation

DIT411/TIN175, Artificial Intelligence

Peter Ljunglöf

26 January, 2018

What is natural language?
Why syntax, anyway?

What is natural language?

Natural language
- Any language that develops naturally in humans through use and repetition
  - e.g.: English, Swedish, Runyankole, Kangiryuarmiutun
- Imprecisely defined: is “I totally lol’ed” correct?
- Interpretation can be ambiguous
Formal language
- Specifically constructed for some purpose, e.g.: Javascript, propositional logic
- Precisely defined
  - print(1 + 2) ✅
  - print(+ 1 2. ❌
- Unambiguous, precise semantics

Natural Language Processing (NLP)

Some examples of NLP tasks:
- Information retrieval, e.g., web search
- Machine translation, e.g. Google translate
- Classification, e.g. sentiment analysis
- Information extraction, e.g. Named entity recognition

Information retrieval

Machine translation

Classification

Sentiment analysis

https://www.csc.ncsu.edu/faculty/healey/tweet_viz

Information extraction

Named entity recognition

http://www.europeana-newspapers.eu/named-entity-recognition-for-digitised-newspapers

Approaches

Rules vs Statistics
- Today many NLP tasks are powered by machine learning
- We have lots of data (corpora), fast processors, cheap storage
- Until the 1990s, most NLP systems were based on complex sets of hand-written rules
Does anyone use rule-based systems today?
- Many NLP systems still use hand-written rules
- E.g., domain-specific dialogue systems
  - Siri, Alexa, Cortana, …, use both hand-written rules and statistical NLP
  - …and not to forget Shrdlite!

Phrase-structure grammars

Words have different lexical categories:
- noun, verb, adjective, prepositions, adverbs, …
We can combine them into phrasal categories:
- “the” (determiner) + “ball” (noun) = “the ball” (noun phrase)
- “in” (preposition) + “a box” (noun phrase) = “in a box” (prep. phrase)
- “put” (verb) + “the ball” (noun phrase) + “in a box” (prep. phrase)
  = “put the ball in a box” (imperative sentence)
A grammar is a set of rules that describe which combinations are possible
- “put a ball in a box on the floor” ✅
- “put a ball in a box on the” ❌

Context-free grammars (CFG)

A Context-Free Grammar is a 4-tuple \(G = (V, \Sigma, R, S)\) where:
- \(V\) is a finite set of non-terminals (called “syntactic categories”)
- \(\Sigma\) is a finite set of terminals (disjoint from \(V\))
- \(R\) is the set of production rules \(X\rightarrow\beta\),
  where \(X\in V\) and \(\beta\in(V \cup \Sigma)^*\)
- \(S\in V\) is the start symbol
Common syntactic sugar:
- \(X \rightarrow \alpha \;|\; \beta \;|\; \gamma\) is the same as \(X \rightarrow \alpha\), \(X \rightarrow \beta\), \(X \rightarrow \gamma\)
- \(X \rightarrow \alpha? \; \beta \; \gamma?\) is the same as \(X \rightarrow \alpha \beta \gamma \;|\; \beta \gamma \;|\; \alpha \beta \;|\; \beta\)

Context-free grammar, example

A first attempt at a context-free grammar for Shrdlite:

Command → take Entity | drop it Location | move Entity Location
Entity → Quantifier Object | the floor
Object → Size? Color? Form | Object (that is)? Location
Location → Relation Entity
Quantifier → every | a | the | all
Size → large | small
Color → red | blue | white | black | …
Form → box | ball | pyramid | … | boxes | balls | pyramids | …
Relation → in | beside | under | …

This example grammar overgenerates:
- “put every bricks on the floor”
- “put all brick on the floor”

Handling agreement in CFG

CFG solution to overgeneration: add more rules
- Entity → QuantifierSG ObjectSG | QuantifierPL ObjectPL | …
- ObjectSG → Size? Color? FormSG | ObjectSG (that is)? Location
- ObjectPL → Size? Color? FormPL | ObjectPL (that are)? Location
- QuantifierSG → every | a | the
- QuantifierPL → all
- FormSG → box | ball | pyramid | …
- FormPL → boxes | balls | pyramids | …
This is how we do it in Shrdlite.

Agreement and definite clause grammar

The CFG solution is not feasible for, e.g., German:

Entity → QuantFemNomSg ObjFemNomSg | QuantMascAckSg ObjMascAckSg | …
QuantFemNomSg → die
QuantMascAckSg → den
ColorFemNomSg → rote | blaue | …
ColorMascAccSg → roten | blauen | …

German has 2 × 3 × 4 = 24 combinations of Number, Gender and Case.
Definite-Clause Grammars use attributes and unification:

Entity → Quantifier[\(g,c,n\)] Object[\(g,c,n\)] \(\Longleftarrow\) Note! Only one rule
Quantifier[fem,nom,sg] → die
Quantifier[masc,acc,sg] → den
Color[fem,nom,sg] → rote | blaue | …
Color[masc,acc,sg] → roten | blauen |

Syntactic analysis (parsing)

Problem: Given a grammar, find a derivation from S for an input string
Function from string to a list of parse results:
parse(g : Grammar, s : String) : Result[]
- 0 results: input is invalid
- 1 result: input is valid and unambiguous
- 2+ results: input is valid and ambiguous
Algorithms:
- Parsing can be formulated as a search problem
- CKY algorithm, chart parsing, probabilistic parsing, …

Shrdlite parse results

The Nearley CFG formalism lets you specify how the parse results should look like:

Command → take Entity { (d) => new TakeCommand(d[1]) }
Command → drop it Location { (d) => new DropCommand(d[2]) }
Command → move Entity Location { (d) => new MoveCommand(d[1], d[2]) }

“put a green ball beside every large box”
\(\Longrightarrow\) MoveCommand(
                       Entity(“any”, SimpleObject(“ball”, null, “green”)),
                       Location(“beside”,
                                  Entity(“all”, SimpleObject(“box”, “large”, null))))

Syntactic ambiguity

“put a ball right of a box in a box beside a table”

How many syntactic analyses?
“put (a ball) right of ((a box in a box) beside a table)”
“put (a ball) right of (a box in (a box beside a table))”
“put (a ball right of a box) in (a box beside a table)”
“put ((a ball right of a box) in a box) beside (a table)”
“put (a ball right of (a box in a box)) beside (a table)”

Levels of ambiguity

Most of the sentences we hear seem unambiguous. But almost every utterance contains some kinds of ambiguity. We are just very good at disambiguating!
Different levels of ambiguity:
- Lexical: a word can belong to multiple categories
  - “Buffalo buffalo buffalo buffalo”
  - “Bison [from] Buffalo [often] confuse [other] bison”
- Syntactic: Phrases can attach at different points in the tree
  - “I ordered a pizza with rucola”
  - “I ordered a pizza with my phone”
- Semantic: Multiple interprations
  - “Everyone loves someone”
  - \(\forall x\exists y. Love(x,y)\) or \(\exists y\forall x. Love(x,y)\) ?

Why syntax, anyway?

So far, I’ve talked about grammars and parse results
- but what do we do with them?
- the parse results themselves are never our final goal
The next step is semantics (= interpretation)
- statistical approaces:
  - e.g., named entitiy recognition, sentiment analysis, relation extraction
- logical approaches:
  - e.g., predicate logic, lambda calculus, modal logic

Semantic representation, the Shrdlite way

Shrdlite semantics is propositional logic:
- a logical description of how we want the final state to look like
  - one term for every object in the world:
    - WhiteBall, BlackBall, …
  - one predicate for every relation:
    - holding(x), inside(x,y), leftof(x,y), …
  - logical disjunction and conjunction instead of quantifiers:
    - P ∨ (Q ∧ R)
- This works because the world is finite!

Interpretation, ambiguities

Is this ambiguous? “put the white ball in the red box”
How about this? “put the ball in the red box”

Shrdlite pipeline

This is how Shrdlite goes from text input to a final plan:
1. Parsing: text input → (many) parse results
2. Interpretation: parse result + world → (many) goals
3. (Ambiguity resolution: many goals → one goal)
4. Planning: goal → plan
5. (Ambiguity resolution: many plans → one plan)

Parsing: text input \(\rightarrow{}\) parse results

function parse(input : string) : ShrdliteResult[]
- (this function is already implemented)
interface ShrdliteResult {
- input : string
- parse : Command
- interpretation : DNFFormula
- plan : string[] }

This is already implemented using the Nearley grammar and parser
- after parsing, every ShrdliteResult contains
  the input string and a parse result
- the interpretation and plan are dummy values

Interpretation: parse result + world \(\rightarrow\) goals

function interpret(parses : ShrdliteResult[], world : WorldState) : ShrdliteResult[]
- (this function is already implemented, but calls interpretCommand)
class Interpreter {
- interpretCommand(cmd : Command) : CommandSemantics
- interpretEntity(ent : Entity) : EntitySemantics
- interpretLocation(loc : Location) : LocationSemantics
- interpretObject(obj : Object) : ObjectSemantics }

This is what you have to implement in lab 2!
- the Interpreter methods should call each other
- what should the respective semantics be?

Semantics of commands: Disjunctive Normal Form

DNF = Disjunctive Normal Form = a disjunction of conjunctions of literals
(normal form = all logical formulae can be converted into this form)

type CommandSemantics = DNFFormula
class DNFFormula {
- conjuncts : Conjunction[] }
class Conjunction {
- literals : Literal[] }
class Literal {
- relation : string
- args : string[] = []
- polarity : boolean = true }

Example: the formula (p(x) ∧ q) ∨ (¬r(y,z)) is created by:

new DNFFormula([
- new Conjunction([new Literal(“p”, [“x”]), new Literal(“q”)]),
- new Conjunction([new Literal(“r”, [“y”,”z”], false)]) ])

Semantics of Objects, Entities and Locations

The semantics of an object description is a collection of
the objects that match the description:
- type ObjectSemantics = string[]
- Example: the semantics of “large box” is ["RedBox", "YellowBox"]
The semantics of an Entity or a Location is just a wrapper
around the semantics of its children:
- type EntitySemantics = {quantifier : string; object : ObjectSemantics}
- type LocationSemantics = {relation : string; entity : EntitySemantics}
- Example: the semantics of “in every large box” is
  {relation: "inside", {quantifier: "all", object: ["RedBox", "YellowBox"]}}

Semantic ambiguity

DNF inherently captures ambiguity
- “put a ball in a box”
  \(\Rightarrow\)
  inside(WhiteBall,RedBox) ∨ inside(WhiteBall,YellowBox) ∨ inside(BlackBall,RedBox) ∨ inside(BlackBall,YellowBox) ∨ inside(BlackBall,BlueBox)
But impossible interperetations should be removed
- Note that we don’t want the interpretation inside(WhiteBall,BlueBox),
  because that violates the physical laws.

Only keep valid interpretations

“put the white ball in a box on the floor”

“put the white ball in a box that is on the floor” \(\Rightarrow\) inside(WhiteBall,YellowBox)
“put the white ball that is in a box on the floor” \(\Rightarrow\) There is no white ball in a box

“in a box on the floor”: Natural interpretation

inside(WhiteBall, YellowBox)
The yellow box is already on the floor: 17 actions to complete

“in a box on the floor”: Alternative interpretation

inside(WhiteBall, RedBox) ∧ on(RedBox, floor)
The red box can be placed on the floor first: 10 actions to complete

The red box is not on the floor at the start!

Final interpretation