The Shrdlite project
In this project you will implement a dialogue system for controlling a robot that lives in a virtual block world. The robot can move around objects of different forms, colors and sizes, and it can answer questions about the world and ask for clarification whenever necessary.
The inspiration is Shrdlu, by Terry Winograd 1970. For Winograd it was a PhD project, but now there are tools available that make things easier: grammar formalisms, ontologies, planners, etc.
You will not have time to create a system that can do everything that Shrdlu can, so you only have to implement a subset of Shrdlu’s capabilities. We call it Shrdlite.
The original Shrdlu
Here are some overviews about the original Shrdlu system, including a demo video:
- An overview: http://hci.stanford.edu/winograd/shrdlu/
- Another overview: http://www.semaphorecorp.com/misc/shrdlu.html
- A demo video: http://projects.csail.mit.edu/video/history/aifilms/26-robot.mp4
- Winograd’s PhD thesis: http://dspace.mit.edu/handle/1721.1/7095
- Finally another nice paper with flowcharts of the Shrdlu grammar (not exactly the way we do it nowadays…): http://dspace.mit.edu/handle/1721.1/5791
Setup
The goal of the project is to create an interpreter and a planner so that a person can control a robot in a blocks world to move around objects, by giving commands in natural language.
To make the project more interesting, there is a web-based graphical interface from which the user can interact with the blocks world.
The interface is written in TypeScript (which compiles into Javascript), and it can be run in two different modes:
- as a HTML web application, using SVG animations for displaying the world;
- as an offline text application, where input is provided at the command line (requires that Node.JS is installed).
To be able to run the system you need to install Node.js, NPM and TypeScript. Do that. Now.
About TypeScript
TypeScript is a typed superset of Javascript that compiles to plain Javascript. It is open-source and not specific to any browser or operating system. For information about the language, please visit the official site:
Before everything, you need to install Node.js, NPM and TypeScript:
- They should already be installed on the Chalmers computers.
- Otherwise, follow the instructions at the TypeScript webpages.
- Note: You need the latest version of TypeScript (at least version 2.2).
To test the version, run
tsc -v
.
TypeScript is an extension of Javascript to make it more similar to standard object-oriented languages such as Java. Furthermore it is statically typed, to reduce programming errors. Here are some documentation and tutorials on TypeScript:
- The official documentation
- TypeScript in 5 minutes
- Learn TypeScript in 30 Minutes
- Learn X in Y minutes
- TypeScript Deep Dive – a free online book
- Introduction to TypeScript – a free online course by Udemy
- TypeScript Playground – an online TypeScript-to-Javascript compiler
You also need to set up Git and GitHub within your group.
What is missing
Most of the project is already implemented, such as two different implementations of the blocks world: one that uses SVG/HTML graphics, and another that works in a plain-text console. Furthermore, the the natural language parser is already implemented using the Nearley parsing library.
What is not implemented correctly is the natural language interpreter and the robot planner. What you are given are stubs that return a dummy interpretation and a dummy plan respectively. Your goal is to implement the interpreter and the planner so that the robot behaves as it should.
The following functions are only templates that you have to fill with intelligent code to get the project to work correctly:
Graph.ts
: the functionaStarSearch(...)
Interpreter.ts
: the classInterpreter
Planner.ts
: the classesPlanner
,ShrdliteNode
andShrdliteGraph
Shrdlite.ts
: the functionparseUtteranceIntoPlan(...)
Depending on which extension you want to implement, there might be other places in the code that need modification.
The GitHub project
Set up your GitHub project using the instructions here.
Here is how the template system should behave when you install it from scratch. Regardless of what you say, it will always return a dummy plan which moves a random block around a bit and then puts it back where it started:
The completed project
When you have completed the project you should have a system that works like this:
Select one of the example sentences and press ‘enter’, such as this one (for the “small” world):
- “put the white ball in a box on the floor”
The system parses the sentence, and tries to infer a sequence of simple actions that accomplishes the task.
Overview
The user interface is an interactive webpage that reads input from the user, does some AI stuff and presents the results in the browser. There is also a text-based version that is invoked from the command line, using Node.js. The AI stuff is done on the client in a Javascript program, which is compiled from TypeScript.
Compiling TypeScript into Javascript
There are several ways to compile TypeScript, depending on how strict you want the typechecker to be, and on what kind of Javascript module system you want to compile into.
The Shrdlite system is designed to be used with the UMD module system,
so please always use --module umd
when you compile your TypeScript files.
Furthermore, we advise you to use the strictest possible version of the typechecker,
which gives this long commandline:
tsc --module umd --strict --noFallthroughCasesInSwitch --noImplicitReturns FILE.ts
We suggest that you store this in a script or as an alias. Or that you use the GNU Make system.
Using GNU Make
There is a Makefile if you want to use the GNU Make system. Here’s what it can do:
make FILE.js
: Calls TypeScript for the given target, i.e., it compilesFILE.ts
intoFILE.js
(whereFILE
can betest-astar
,test-interpreter
,shrdlite-html
, orshrdlite-offline
)make clean
: Removes all auto-generated Javascript filesmake Grammar.ts
: Uses Nearley to compileGrammar.ne
into TypeScript (you only have to do this if you change the grammar)
Running the system
To start the browser-based system, you first compile shrdlite-html.ts
and then you open shrdlite.html
in your favourite browser.
Note that it is very difficult to test and debug a system in a browser,
so therefore it is probably better to use the command-line interface while developing the system.
To do this, you compile shrdlite-offline.ts
and run it using Node.js:
node shrdlite-offline.js WORLD EXAMPLE
where WORLD
is one of the possible example worlds (“small”, “medium” or “complex”),
and EXAMPLE
is either a number (denoting one of the example sentences)
or an utterance (within quotes). Such as:
node shrdlite-offline.js small "put the white ball in a box on the floor"
node shrdlite-offline.js medium 5
Data structures
You probably want to use some kind of collection datatype
(such as a set, a priority queue, and/or a dictionary), so the de facto-standard library
typescript-collections
is already included in the Shrdlite repository (in the directory lib/typescript-collections
).
To use a datatype from the collections library, you import it like this:
import Set from "./lib/typescript-collections/src/lib/Set";
import PriorityQueue from "./lib/typescript-collections/src/lib/PriorityQueue";
import Dictionary from "./lib/typescript-collections/src/lib/Dictionary";
Using Javascript modules in TypeScript
If you want to use standard Javascript libraries in TypeScript, you have to add a TypeScript declaration file for that library. The DefinitelyTyped library contains declaration files for several libraries, such as the following two:
node.d.ts
jquery.d.ts
In fact, these two are already included in the lib
directory.
Read more about declaration files in the
TypeScript documentation.
Javascript chart parser
The parser is generated by Nearley.
The grammar is in the file Grammar.ne
, and it is compiled into the TypeScript file Grammar.ts
.
If you plan to make any changes in the grammar, you have to install Nearley.
Overview of the Shrdlite pipeline
The Shrdlite system consists of the following pipeline:
-
Input:
- A user utterance is read from the webpage, all words separated by whitespace.
- Output: a list of strings
-
Parsing:
- The utterance is parsed into zero, one or several semantic trees.
- Note: the grammar and parser are already given to you.
- Output: a list of parse results
-
Interpretation:
- Each parse result is interpreted in the current world.
- Note: some parses might not have an interpretation, and some parses might have several interpretations.
- Output: a list of logical interpretations
-
Ambiguity resolution:
-
If the utterance results in several different interpretations, the ambiguity should be resolved in some way. The easiest (and worst) solution is to simply reject the utterance.
A better solution could be to ask a clarification question, in which case the program must exit by outputting the question, and then resuming when the user has answered.
- Depending on your preferences, you might want to put the ambiguity resolution after the planner instead.
- Output: a single interpretation (or failure, or a clarification question)
-
-
Planning:
- The interpretation is solved in the current world, resulting in a plan of basic commands that can take the current world into the goal state.
- Output: a list of basic commands
-
Output:
- The final list of basic commands is performed by the virtual robot.
The world
Shrdlite uses a 2-dimensional blocks world, not 3d. The main reason for this is that it is much easier to visualize, but also because spatial reasoning becomes easier.
The world contains a floor, and a number of objects. The objects can be on top of or inside each other (if the form and size permits). The goal of the “game” is to move around the objects in any way the user likes, a bit like Minecraft or Lego. This can only be done by talking to a robot arm which can pick up and put down objects.
The world contains at least the following forms, colors and sizes:
- Forms: Bricks, planks, balls, pyramids, boxes and tables.
- Colors: Red, black, blue, green, yellow, white.
- Sizes: Large, small.
To simplify things, the floor only has room for at most N objects at the same time. We call each of these places a stack or a column, and number them 0…N-1.
This means that internally, the world can be represented as a list of
stacks, where each stack is a list of objects, where each object has a
unique identifier. This is not the only possible representation
(another is to e.g. use a set of logical statements), but the one we use here.
This is how the world state is implemented in World.ts
:
interface WorldState {
stacks: string[][];
holding: string | null;
arm: number;
objects: { [s:string]: ObjectDefinition; };
examples: string[];
}
Each stack is a list of strings, where the strings themselves are unique identifiers for the objects in the world. The state also contains information about which object the robot arm is holding (if any), the position of the arm, descriptions of each object, and a list of example utterances.
There are four example worlds implemented in the file ExampleWorlds.ts
:
“small”, “medium”, “complex” and “impossible”.
The impossible world contains examples that break the physical laws in different ways,
so it is not supposed to be used.
But the other three worlds can be used in both the browser version and the text-based version.
You can also add new worlds to the file ExampleWorlds.ts
, it should be fairly self-explanatory.
Physical laws
The world is ruled by physical laws that constrain the placement and movement of the objects:
- The floor can support at most N objects (beside each other).
- All objects must be supported by something.
- The arm can only hold one object at the time.
- The arm can only pick up free objects.
- Objects are “inside” boxes, but “ontop” of other objects.
- Balls must be in boxes or on the floor, otherwise they roll away.
- Balls cannot support anything.
- Small objects cannot support large objects.
- Boxes cannot contain pyramids, planks or boxes of the same size.
- Small boxes cannot be supported by small bricks or pyramids.
- Large boxes cannot be supported by large pyramids.
There is an example world in the project template called “Impossible”, which gives examples of bricks that break the physical laws in some way.
Spatial relations
The following spatial relations can be used to describe how objects are to be positioned:
- x is on top of y if it is directly on top – the same relation is called inside if y is a box.
- x is above y if it is somewhere above.
- x is under y if it is somewhere below.
- x is beside y if they are in adjacent stacks.
- x is left of y if it is somewhere to the left.
- x is right of y if it is somewhere to the right.
Note that these definitions are not set in stone; if you want you can use different definitions.
The basic commands
A plan is a sequence of basic commands, and there are only four of them:
- left: Move the arm one step to the left.
- right: Move the arm one step to the right.
- pick: Pick up the topmost object in the stack where the arm is.
- drop: Drop the object that you’re currently holding onto the current stack.
Parsing
User utterance interpretation comes in two steps. First the utterance has to be parsed. To ease the burden, the grammar and parser is already given to you, using the Javascript library Nearley:
The parse result is a Javascript object reflecting the syntactic structure of the input utterance. The object doesn’t contain all details of the specific grammar, but instead only the important semantic parts.
Ambiguities
Note that the grammar is ambiguous, meaning that it can return several parse results for a single input utterance. This is on purpose, since natural language often is ambiguous. Here is an example utterance that returns two trees:
- “put the white ball in a box on the floor”
The two trees correspond to the meaning of the following two unambiguous utterances:
- “put the white ball that is in a box on the floor”
- “put the white ball in a box that is on the floor”
The parser module returns all possibilities with the hope that the interpretation module can disambiguate the intended meaning in the current world. For example, if there is no ball inside any box, then the first meaning can be filtered out.
Overview of the grammar
Here is a quick overview of how the grammar works. All utterances start with the nonterminal Command.
- Command: There are three kinds of commands:
- “take/grasp/pick up” an Entity
- “move/put/drop” “it” at a Location
- “move/put/drop” an Entity to a Location
- All commands can be preceded by an optional “will/can/could you”, and preceded or followed by an optional “please”.
-
Location: A location consists of a Relation followed by an Entity.
- Entity: There are two kinds of entities:
- Normally it consists of a Quantifier followed by an Object, where both must agree in grammatical number (singular/plural).
- But the Shrdlite world also has a special entity, “the floor”.
- Object: There are two ways of forming an object:
- A basic object consists of a Size (optional), a Color (optional), and a Form (required).
- Alternatively, an object can consist of an Object followed by a relative clause, i.e., “that is/are” Location, or just Location.
-
Quantifier: A determiner such as “a/an”, “any”, “the”, “all”, “every”, etc.
-
Relation: A preposition such as “beside”, “left of”, “above”, etc.
-
Size: An adjective such as “tiny”, “small”, “large”, etc.
-
Color: Also an adjective, “black”, “white”, “blue”, etc.
- Form: A noun such as “object”, “ball”, “box”, etc. in singular – the plural variants are “objects”, “balls”, “boxes”, etc.
Detailed description of the phrase-structure rules
The grammar is specified in a special grammar file which is compiled into Javascript by the Nearley library.
Here is a quick explanation of the Shrdlite grammar, as it is defined in the file Grammar.ne
.
All types and classes are defined in the file Types.ts
.
You don’t have to make any changes to the grammar or parser (unless you want to augment the grammar with new commands or linguistic constructions).
Command
Either the robot can pick up something, drop the thing it’s holding, or move something to a new location:
command --> ("take"|"grasp"|"pick up") entity
| ("move"|"put"|"drop") "it" location
| ("move"|"put"|"drop") entity location
The parser returns a TypeScript object conforming in one of the Command
classes:
type Command = TakeCommand | DropCommand | MoveCommand;
class TakeCommand {
constructor(entity : Entity)
}
class DropCommand {
constructor(location : Location)
}
class MoveCommand {
constructor(entity : Entity, location : Location)
}
Using phrase-structure grammar terms, the command is an imperative phrase (which in English is the same as a verb phrase). The verb “take” is a transitive verb, taking a noun phrase as argument, while “move” is a ditransitive verb, taking a noun phrase and an adverbial as arguments.
Entity
A normal entity consists of a quantifier followed by an object description:
entity --> quantifier[Number] object[Number]
But there is also a special entity that can only be referred to in definite singular form, without any recursive structures:
entity --> "the" "floor"
The arguments to a quantified entity have to be both singular or both plural,
i.e., they have to agree in grammatical number.
The parser returns an object in the class Entity
:
class Entity {
constructor(quantifier : string, object : Object)
}
In phrase-structure grammar, the entity corresponds to a noun phrase, which in turn is constructed by a determiner followed by a complex noun.
Location
A location is a relation followed by an entity:
location --> relation entity
The parser returns an object in the class Location
:
class Location {
constructor(relation : string, entity : Entity)
}
Using phrase-structure grammar terms, the location is a prepositional phrase which in turn is a kind of adverbial. The location consists of a relation, which is a preposition, and an entity (which we already know is a noun phrase).
Object
An object can either be simple, consisting of an optional size, optional color and a form:
object[Number] --> size? color? form[Number]
The size and color happen to be adjectives and the form is a noun. Since English adjectives aren’t inflected for number, it’s only the form that can be either singular or plural.
An object can also be complex:
object[Number] --> object[Number] "that is/are"? location
Note that the number of the complex object depends solely on the number of the immediate object. This is because the meaning of the complex object is that the location “filters out” possibilities that are given by the immediate object.
Also note that the relative pronoun+copula (“that is/are”) is optional, and it is this optionality that makes utterances potentially ambiguous.
Depending on the type of object, the parser returns either a RelativeObject
or a SimpleObject
:
type Object = RelativeObject | SimpleObject;
class RelativeObject {
constructor(object : Object, location : Location)
}
class SimpleObject {
constructor(form : string, size : string|null, color : string|null)
}
Note the order of the arguments to SimpleObject
:
size
and color
comes last because they are optional.
The remaining grammar rules
The rest of the grammar consists mainly of lexical rules. Some of the lexical items are synonyms, such as the relations “under” and “below”:
relation --> "under" | "below"
| "beside"
| ...
The lexical rules return an atom (i.e., a TypeScript string),
and synonyms return the same string
(i.e., both “under” and “below” return the atom under
as parse result).
Interpretation
The parser can return several parse results, and each of them should be interpreted in the current world. This interpretation will hopefully filter out some parse trees that simply cannot be understood, but it might also reveal that one parse tree is ambiguous in the current world.
Assume the ambiguous example utterance from above:
- “put the white ball in a box on the floor”
The given grammar assigns the utterance the following two possible parses:
-
put the white ball that is in a box on the floor:
MoveCommand( Entity("the", RelativeObject( SimpleObject("ball", null, "white"), Location("inside", Entity("any", SimpleObject("box", null, null))))), Location("ontop", Entity("the", SimpleObject("floor", null, null))))
-
put the white ball in a box that is on the floor:
MoveCommand( Entity("the", SimpleObject("ball", null, "white")), Location("inside", Entity("any", RelativeObject( SimpleObject("box", null, null), Location("ontop", Entity("the", SimpleObject("floor", null, null)))))))
The goal of the interpreter is to convert a parse result into a logical interpretation of the final goal of the command. This can be done in two steps:
-
First you find out what objects that the different parts of the parse are referring to.
- In the example there are three different objects mentioned – the ball, a box and the floor. For the first parse to be valid, there should already exist a ball that is inside a box in the current world. For the second parse to be valid, there must already be a box that is on the floor.
-
When you have identified the different objects, you formulate the goal by looking at the destination of the movement command.
-
For the take and put commands, the procedure is similar but different…
Goals
Now, assume that the world consists of two balls (LargeWhiteBall, SmallBlackBall), one table (LargeBlueTable), and three boxes (LargeRedBox, LargeYellowBox, SmallBlueBox). The white ball, the table and the yellow box are on the floor, while the red box is on the table, the small box is in the yellow box, and the black ball is in the small box. Like this:
The interpretation of the example utterance might go like this:
-
Parse (1) cannot have an interpretation in the current world, since there is no white ball in any box.
-
Parse (2) does have an interpretation, where the white ball is LargeWhiteBall and the only box on the floor is LargeYellowBox. The resulting goal is that LargeWhiteBall should be inside of LargeYellowBox:
- inside(LargeWhiteBall, LargeYellowBox)
Conjunctive goals
The goal inside(LargeWhiteBall, LargeYellowBox) does not say that the box must remain on the floor, which means that the planner is allowed to move LargeYellowBox to another place if it thinks this is a better thing to do.
If we want to ensure that the box is not moved from the floor, we can state this as a conjunctive goal:
- inside(LargeWhiteBall, LargeYellowBox) ∧ ontop(LargeYellowBox, floor)
Disjunctive goals
If, on the other hand, box LargeRedBox had also been on the floor, then there would be two possible boxes that matched. The resulting interpretation would then become:
- inside(LargeWhiteBall, LargeRedBox) ∨ inside(LargeWhiteBall, LargeYellowBox)
Now if we want to ensure that the boxes stay on the floor, this can be done via a disjunctive conjunctive (DNF) formula:
- [inside(LargeWhiteBall, LargeRedBox) ∧ ontop(LargeRedBox, floor)]
∨ [inside(LargeWhiteBall, LargeYellowBox) ∧ ontop(LargeYellowBox, floor)]
Quantifiers
One of the main problems with natural language interpretation is to handle quantification correctly. The Shrdlite grammar can recognize three different quantifiers: the, any and all, and they should all be interpreted differently.
The minimum requirement of your interpreter (and planner) is that it can handle the singular any quantifier (it is fine if you use the same interpretation for both the and any).
If you want your interpreter to be more interesting, it (and the planner) should also be able to handle the all quantifier, and distinguish between the and any. The main difference between these two is that the implicitly requires that there is only one object that fits the description. If there are more than one possible interpretation of a the quantified object, then the system can, for example, ask a clarification question.
Clarification questions
If the user’s utterance is ambiguous in the current world, i.e., if there are several parse trees that can be translated into goals, then it is helpful if the system can ask a clarification question.
To be able to interpret the answer to a clarification question, the system needs to know which state it was in before it asked the question. For this purpose, it might be necessary for you to extend the definition of the world state.
Detailed description of semantic interpretation
Parsing can be viewed as a function from strings to syntactic structures. In fact, from each nonterminal X in the grammar we can conceptually define a separate parser parseX that takes a string and returns a list of syntactic objects of type X:
function parseCommand(input : string) : Command[]
function parseEntity(input : string) : Entity[]
function parseLocation(input : string) : Location[]
function parseObject(input : string) : Object[]
(Note: this is not how the NEarley parser works, but I’m just saying that a parser could work like this).
In a similar way we can view semantic interpretation as a function from syntactic to semantic structures. For each nonterminal X, the semantic interpretation would then be a function interpretX that takes a syntactic object of type X and returns a semantic object XSemantics:
function interpretCommand(syntax : Command) : CommandSemantics
function interpretEntity(syntax : Entity) : EntitySemantics
function interpretLocation(syntax : Location) : LocationSemantics
function interpretObject(syntax : Object) : ObjectSemantics
This means that for each syntactic structure (Command, Entity, Location, Object) we have to decide what the corresponding semantic type should look like.
Semantic types for Shrdlite
For the Shrdlite application, these are possible definitions of the semantic types:
type CommandSemantics = DNFFormula
type EntitySemantics = {quantifier : string; object : ObjectSemantics}
type LocationSemantics = {relation : string; entity : EntitySemantics}
type ObjectSemantics = string[]
-
Commands are the toplevel structures, so they should return the main interpretation, which will be a logical formula in disjunctive normal form:
DNFFormula([Conjunction([Literal(...), ...]), Conjunction(...), ...])
-
The semantics of an object description (e.g., “white ball in a box”) should return all matching objects (i.e., all white balls that are in boxes), so the semantic type can be a list of object ids:
["LargeRedBox", "LargeYellowBox"]
-
It is difficult to associate a concrete semantics to Entities or Locations, so one possibility is simply to store the quantifier or the relation and let the caller handle the deduction:
{relation: "inside", {quantifier: "any", object: ["LargeRedBox", "LargeYellowBox"]}}
The interpretation context
The interpretation can only be done in the context of the current world state.
Our solution is to create an interpretation class that has the current world state
as an instance variable, and let all functions interpretX
be methods in that class:
class Interpreter {
constructor(world : WorldState)
interpretCommand(syntax : Command) : CommandSemantics {
...code for interpretCommand...
}
interpretEntity(syntax : Entity) : EntitySemantics {
...code for interpretEntity...
}
...etc...
}
Goals
Given the same world state, what should the semantic interpretation of a command
(such as “put all small balls in a box”) be?
The method Interpreter.interpretCommand
returns an object of the type CommandSemantics
,
which we already decided was a formula in disjunctive normal form:
interpretCommand(cmd : Command) : CommandSemantics
type CommandSemantics = DNFFormula
If there’s an interpretation error, the function should throw an error with a string description.
If the function finds an interpretation, then it is returned as a formula in disjunctive normal form. It consists of a list of conjunctions, which in turn consists of a list of literals:
class DNFFormula {
constructor(conjuncts : Conjunction[])
}
class Conjunction {
constructor(literals : Literal[])
}
class Literal {
constructor(relation : string, args : string[], polarity : boolean = true)
}
So, the suggested interpretation is a list of list of “literals”, where
a literal
is an atomic formula or its negation.
Here are some examples of literals, together with how they are written as Literal
s:
ontop(a,b) | Literal("ontop", ["a","b"]) |
holding(q) | Literal("holding", ["q"]) |
¬leftof(c,d) | Literal("leftof", ["c","d"], false) |
p | Literal("p", []) |
So, why a list of list of literals? The idea is that it is a logical formula in disjunctive normal form. The following are translations of some logical formulae in disjunctive normal form:
P ∧ Q | DNFFormula([Conjunction([P, Q])]) |
Q ∨ R | DNFFormula[Conjunction([Q]), Conjunction([R])]) |
(P ∧ Q) ∨ R | DNFFormula([Conjunction([P,Q]), Conjunction([R])]) |
P ∧ (Q ∨ R) | …must be converted to DNF first… |
Using DNFFormula
as the semantic result of a command allows us to distinguish between the following commands (assuming a,b are balls and e,f are boxes):
“put a small ball beside a box” | beside(a,e) ∨ beside(a,f) ∨ beside(b,e) ∨ beside(b,f) | DNFFormula([Conjunction([beside(a,e)]), Conjunction([beside(a,f)]), Conjunction([beside(b,e)]), Conjunction([beside(b,f)])]) |
“put all small balls beside all boxes” | beside(a,e) ∧ beside(a,f) ∧ beside(b,e) ∧ beside(b,f) | DNFFormula([Conjunction([beside(a,e), beside(a,f), beside(b,e), beside(b,f)])]) |
Exercise suitable for discussion within the project groups:
- What should be the semantic interpretation of the comand “put all small balls in a box” / “put every small ball in a box”?
- What about “put a small ball in all boxes” / “put a small ball in every box”?
Ambiguous interpretations
Sometimes an utterance is syntactically ambiguous, and each syntactic structure can have a distinct a semantic interpretation in the current world. An example is the second example in the small world:
- “put the black ball in a box on the floor”
This utterance has two possible syntactic analyses, corresponding to the following groupings of “a box”:
- “put the black ball in a box on the floor”
- “put the black ball in a box on the floor”
Both of these analyses have a distinct interpretation in the small world: Either the user wants the robot to put the black ball (which happens to be in a box) on the floor, or the user means that the black ball should end up in the yellow box (which is the only box that is on the floor).
This utterance is therefore ambiguous, and you have to handle that in some way. One variant is to just report that it is ambiguous and refuse to do anything. But there are of course more elaborate ways of handling ambiguities, such as by asking the user a clarification question. In that case you have to make more modifications to the code.
Interpretation failures
Sometimes a specific syntactic structure does not have a semantic interpretation in the current world. An example is the first example in the small world:
- “put the white ball in a box on the floor”
This utterance has two possible syntactic analyses (just as for the black ball). However, the first analysis cannot be interpreted in the small world, since there are no white balls in any boxes. How can we accomplish this?
Our solution is to let the interpretX(...)
methods throw an exception if there is no interpretation.
The calling method interpretCommand(...)
can then handle this by interpreting all possible parses,
and only report an interpretation error if there are no interpretations whatsoever.
Montague semantics
The semantic interpreter in your Shrdlite project will be a very light-weight variant of Montague semantics. Read more about these things in chapter 2 of Ted Briscoe’s introduction to syntax and semantics:
A final note: Perhaps you noticed that the interpreter doesn’t return a list of XSem objects (which the parser does). The reason for this is that you won’t need that in your Shrdlite project (at least not in the basic project), but Briscoe gives in section 2.3 some examples where the same syntactic structure can give rise to several semantic interpretations.
Planning
The planner should take a representation of the goal, and a representation of the current world.
Let’s continue with our example utterance and world as before, giving the same goal:
- inside(LargeWhiteBall, LargeYellowBox)
Now, the planner should take the world representation and the goal and return a sequence of the basic actions (rlpd). Here is a possible solution (assuming that the robot arm starts in position 0):
r r r p r d l p l d l l p r r r d
Or, in other words:
- First it moves SmallBlackBall to the floor (
r r r p r d
), - then SmallBlueBox to the floor (
l p l d
), - and finally it moves LargeWhiteBall inside LargeYellowBox (
l l p r r r d
).
Note that to be able to complete this, the planner also needs a representation of the background knowledge (e.g., which kinds of objects can be on top of, or inside, which objects).
Planning as A* search
To solve the planning you should encode the world and the actions as a search graph,
so that you can call your own A* implementation. The planner file contains
templates for the classes Planner
, ShrdliteNode
and ShrdliteGraph
.
The class Planner should implement the method makePlan, which takes an
interpretation as argument and returns a plan (a list of strings):
class Planner {
constructor(world : WorldState)
makePlan(interpretation : DNFFormula) : string[]
}
class ShrdliteNode
class ShrdliteGraph implements Graph<ShrdliteNode>
The plan and the GUI
The Javascript GUI wants the final plan as a list of strings, where
each string is either a basic action (e.g., r
or d
),
or a system utterance that will be printed as part of the dialogue.
Like this:
["First I move the black ball out of the way",
"r", "r", "r", "p", "r", "d",
"Then I move the small box out of the way",
"l", "p", "l", "d",
"Finally I put the white ball in the yellow box",
"l", "l", "p", "r", "r", "r", "d"]
Grading
To be acceptable, your program must fulfill at least the following:
-
It must obey all requirements of the world, in particular the physical laws and the spatial relations.
-
It must try to disambiguate ambiguous utterances by interpreting each parse tree in the current world. If there are several possible meaningful parse trees, it must at least fail by saying that the utterance is ambiguous.
-
It must translate a meaningful parse tree into a goal in a logical representation.
-
It does not have to handle the plural all quantifier, and it may use the same meaning for the as the singular any.
-
It must be able to plan a command sequence that solves a goal in a small world still obeying the physical laws.
-
The planner should at least be an A* search forward planner, using some kind of heuristics. (Note: with heuristcs we mean a nonzero meaningful heuristics, e.g., h(n)=100 is not meaningful even if it’s nonzero).
To receive a higher grade than 3 / G, the critera are like follows:
-
Grade 4: Your code is readable and well commented, and you have implemented at least one large extension, or two small extensions.
-
Grade 5 / VG: Your code is highly readable, efficient and very well commented, and you have implemented at least two large extensions, which are independent of each other.
Note that your individual grade might differ from the grades of your fellow group members, depending on your personal contributions and how well you interacted with your group.
Extensions
Here are some ideas for possible extensions. Note that most of them can be done in different ways, more or less advanced, so it’s not directly clear from these description if an extension is “small” or “large”. You have to discuss with your supervisor.
(Mostly) small extensions
-
To be able to handle all quantifiers in a sensible manner.
-
Add an new Location or a new Command, e.g. “between a box and a ball”, or “stack up all tables”. Note that this might be a large extension, depending on how complex grammar and semantics you need.
-
Handle deep nesting in a sensible manner: e.g., interpreter test n:o 38, “put a brick on a brick on a brick on the floor” has 5 parses, and one of them has a strange naive interpretation.
-
Improved search: Either a better heuristics, compared to the baseline N × (n:o objects above X + n:o objects above Y), or a better search algorithm, such as IDA* or SMA*. Note that some search algorithms can be seen as large extensions.
-
More fine-grained cost calculation: e.g., take the height of the stacks into account, so that you can calculate the time for picking up / dropping down objects.
-
To make the planner describe what it is doing, in a way that is understandable to humans. One important part of this will be to know how to describe the different objects in a concise way (e.g., if there is only one ball, then you don’t have to say that it is white). Note that this can also be a large extension: if you not only describe what it is doing (“moving the X on top of Y”), but also describe why (because you need to clear the table underneath X); and/or if it describes equivalent objects by its neighbours (e.g., in the medium world there are 3 large green bricks).
(Mostly) large extensions
-
Adding new quantifiers and handling them in a sensible manner: e.g., “all Xs”, “X and Y”, “two Xs”, “X or Y”, “all Xs except Y”
-
To ask the user clarification questions when the utterance is ambiguous, e.g. “do you mean the large red pyramid or the small green?”. Note that the user must be able to give a sensible answer to the system question.
-
Anaphoric references, such as “it” or “one”: e.g. “put the ball in a box beside it”, or “put the plank under a brick on top of it”, or “put a plank on a brick” – “put a box on it”, or “put the blue box in the yellow one”, or “put a white ball in blue box on a yellow one”
-
New commands for information-seeking: e.g., “where is X?”, “find all Xs”, “describe X”, “why did you move X?”. Note that the system has to be able to describe objects compactly, and using neighbours if necessary.
-
System suggests interesting utterances, e.g., utterances that make for interesting problems
-
Advanced planning algorithms: e.g., bidirectional search, convert to CSP, non-trivial local search, anytime search
-
To make the system suggest interesting commands, i.e., commands that give rise to complicated planning problems.
-
Changes to the world: e.g., several robot arms, or a 3d world, or varying-size objects covering several columns, or the arm only lifts as high as needed, or turn Shrdlite into a game, etc.
-
Adding uncertainties, e.g., actions that can fail, partial observability, etc.
-
Or something else, but talk to your supervisor first!