Category Archives: Uncategorized

The 6 Evolutionary Stages of Chatbot AI


[I posted this originally over on I’m archiving it here.]

Suddenly this year, there are ‘conversational interfaces’ and ‘chatbots’ on offer from companies big and small. Many claim to be breaking new ground, and that’s surely true in some cases. For example, Google’s recent release of SyntaxNet uses Deep Learning in a new way to achieve the greatest accuracy ever attained for syntactic parsers. The release of that tool allows companies whose chatbots have reached a certain stage of development to switch out their own syntactic parser and focus even more intently on the really interesting problems.. But how can we assess the smartness (i.e. potential usefulness) of these bots?

Many of the techniques used in these bots are new to industry but have been known to AI researchers for a decade or more. So an AI researcher can make pretty good guesses about how each bot works based on what features are promoted in its demos. What I’m going to do in this article is to give you some tools for making the same insights.

Let’s adopt a guiding metaphor that bot intelligence follows a natural evolutionary trajectory. Of course, bots are artifacts, not creatures, and are designed, not evolved. Yet within the universe of artifacts, bots have more in common with, say, industrial design than fashion, because functionality determines what survives. Like the major branches of Earth’s genetic tree, there are bundles of features and functionality that mark out a qualitative change along the trajectory of development. How many stages are there in the evolution of chatbot AI? Based on my years of research in NLP and discussion with other experts, my answer is six. Let’s look at each.

Stage 1, The Character Actor: Bots at this stage look for key phrases in what you type (or say, which they process with speech recognition) and give scripted responses. The primary effort that goes into these bots is making the scripted responses entertaining so you’ll want to continue the interaction. But because they only look for key phrases, they have a shallow understanding of what you’re saying and cannot go in-depth on any topic, so they tend to shift topics frequently to keep the interaction interesting. Bots at higher stages need to provide interesting interaction also, so it’s not a good idea to skip stage 1 entirely. There are no really good techniques at later stages to convey personality, emotion, or to be entertaining.

Stage 2, The Assistant: Bots at this stage can do narrow tasks they’ve been explicitly programmed to do, like take a pizza order or make a reservation.Each kind of task is represented with a form (i.e., a tree of labelled slots) that must be filled out so the bot can pass the form values to a web service that completes the order. These bots use rules similar to stage 1, but the rules have been split down the middle into two sets of rules. One set looks for key phrases in what you type, as before, and use parts of what you write to fill in the form. The other set checks which parts of the form still need filling and prompts for answers, or when the form is full, calls a web service.

This form-filling design has been part of VoiceXml, a technology that underpins automated phone services, since the early 2000s. And it’s been in AI research systems since the late 90s. Recent work in machine learning and NLP certainly makes this design work better than those old systems do.

Stage 3, The Talking Encyclopedia: Bots at this stage no longer require a predefined form but instead build-up a representation of what you mean word-by-word (aka ‘semantic parsing’). This is the stage that Google’s SyntaxNet is designed to help with, because these systems usually try to identify the ‘part of ‘speech’ of each word and how they relate to each other first, as a guide to extracting meaning. SyntaxNet can do the first step, leaving the hard, interesting work of extracting meaning that distinguishes bots at this level.

A common use for these bots is for looking up facts, such as the weather forecast two days from now, or the third-tallest building in the world, or a highly rated Chinese restaurant nearby. The way they make this work is that they build a database query as part of doing the semantic parse, leveraging the syntactic parse result. For example, nouns like ‘restaurant’ can be indexed to a database table of restaurants, and adjectives like ‘Chinese’ and ‘nearby’ add constraints for querying that database table.

Stage 4, The Historian: Bots in all the stages so far aren’t good at understanding how one of your sentences connects to another (unless both sentences are focused on a common task). Bots can’t even do a good job of knowing what you’re talking about when you say ‘he’ or other pronouns. These limitations are due to not having a usable record of how the exchanges have gone and what they’ve been about. What distinguishes a bot at stage 4 is that it maintains a ‘discourse history’.

In addition to those benefits, research suggests that this is the best stage to integrate gesture understanding with language understanding. The reason gesture understanding fits at this stage is that pointing gestures are a lot like pronouns, and more complex demonstrative gestures (like miming how to get a bottle cap off) build on pointing-like gestures similar to the way sentences build off pronouns.

Remember how last year’s buzz was about virtual and augmented reality? In the near future the buzz will be about giving chatbots a 3D embodiment which can understand you just from your body language. After all, conversational interfaces should be easier to use than others because you just talk, as one does in everyday life. If you reduce the amount of talk needed to just what people use with each other, that’s really smart/useful.

Stage 5, The Collaborator: The Assistant in stage 2 recognises requests that match the tasks it knows about, but it doesn’t think about what you’re ultimately trying to achieve and whether it could advise a better path. The best example of such a bot to be found in AI research is one that, when asked which platform to catch a certain train, answers that the train isn’t running today. It’s certainly possible to get bots at earlier stages to check for specific problems like this one, but only stage 5 bots have a general ability to help in this way.

Bots at this stage seem to fulfill the ideal of the helpful bot, but it’s still not what the public or researchers dream of. That dream bot is found in the next stage.

Stage 6, The Companion: This is a mythical stage that all bot creators aspire to but which only exists so far in fiction, like the movie Her. Ask yourself what is it about the computer character that people wish were real. I believe her empathy and humor are part of it, but that her core virtue is that she understands his situation, and what could help him through it, and she does it. Would a real bot require any technological advance beyond stage 5 to help in similar ways? It seems not. What bots do need to reach this stage is a model of everyday concerns that people have and how to provide counsel about them. That core skill, leavened with humor from stage 1, could provide something close to companionship.

Could a bot be a good counselor? It’s certainly a serious question, because the elders, teens, parents, and anxious wage-earners of today and coming years are outstripping available professional care. Conversational AI has potential to deliver real benefits beyond simplifying interfaces. Let’s work with professional counselors and make it happen.

I hope this article has provided a way for us all to identify the strengths in different chatbots and talk about them with workable shorthand terms. It would be great to see discussions about some particular chatbot being strong in stage 4 but boring because it neglected stage 1, for example. That would give everyone a chance to influence the directions of this very promising technological advance.

C++ move semantics and the copy-swap idiom

Following the Resource Allocation Is Initialization technique, the copy-swap idiom for assigning the contents of one object to another may be the next-richest wisdom-in-a-nutshell technique. It’s a great way to learn about the “move semantics” introduced in C++0x (aka the update to the language that was expected to be delivered before 2010 but was actually a little late).

Clearing Eclipse when it seems to hang-on to old project settings

When Eclipse seems to be holding-on to old project settings:

  1. Make sure you’re updating the correct configuration file. For example in a Spring project, you might have xml files with very similar content for dev use, production use, and integration test use. If there’s an exception, look closely if it mentions such a filepath; you might be editing the wrong one!
  2. If all else fails,
    1. Shutdown Eclipse
    2. Find .metadata/.plugins/.org.eclipse.wst.server.core/ under your workspace dir and delete all its contents
    3. Restart Eclipse

C++ debugging in Eclipse IDE

If you’re a Java developer and find yourself also doing some coding in another language, you don’t want to have to change your IDE just to do so. That would make the learning curve twice as steep. Fortunately, if you use the Eclipse IDE and you need to code in C++ (or JavaScript), you don’t have to switch.

CHUA Hock-Chuan has an excellent guide about how to set this up. Some caveats:

  1. I tested with MinimalGnuWindows; I don’t know if I’m missing out on something better in Cygwin (but everytime I’ve encountered Cygwin’s installer, I wonder why it’s so difficult to use)
  2. To get support for the 2011 version of C++ (aka “C++0x” or “C++11”), go to Project | Properties | C/C++ Build | Settings | GCC C++ Compiler | Miscellaneous | Other flags and append -std=c++11 to whatever might already be there. Then do the same for GCC C Compiler | Miscellaneous | Other flags.
  3. “Clean”ing a project may not work on your Windows (ref1) (ref2), because it uses “rm -rf”. If this happens, install GNU CoreUtils, make sure it’s in your PATH (i.e., append ;C:\Program Files (x86)\GnuWin32\bin), and restart Eclipse.
  4. When you get to the point of entering the C++ helloworld, std, cout, and endl will show errors about not being able to resolve them. These errors disappeared when I saved the file, which probably means there’s a fast kind of syntax-checking done for (almost) every key press, and a more expensive compilation done only when edits are saved.
  5. If you check the Includes settings under Project, all the paths will be entirely down-cased. My actual paths use some camel-casing and work fine, so this must just be a quirk of the dialog.
  6. Even if the project is set to Build Automatically, you must still select Project | Build Project for the binaries to be created.
    1. If ‘clean’ or ‘build project’ fail with error ‘make cannot be found in PATH’, then ensure you have the following setting: Project | Properties | C/C++ Build | Tool Chain Editor | Current toolchain = MinGW GCC. If you had to change this setting, try cleaning or building again.

CFP: 5th Intl. Workshop on Human Behavior Understanding

I won’t be there, but I’m interested to see the paper titles after the event…

Call for Papers: 5th Int. Workshop on Human Behavior Understanding
(HBU'2014) to be held in conjunction with ECCV'14, 12 September,
Zurich, Switzerland

"Focus Theme: Computer Vision for Complex Social Interactions"
Short description:
The Fifth Workshop on Human Behavior Understanding, organized as a
satellite to ECCV'14, will gather researchers dealing with the problem
of modeling human behavior under its multiple facets (expression of
emotions, display of complex social and relational behaviors,
performance of individual or joint actions, etc.), with the focus
topic of computer vision for complex social interactions.
While different aspects of social interactions are tackled in several
venues, this workshop will solicit computer vision solutions that
clearly advance the field, and chart the future of computer analysis
of complex interactions. Topics of interest include, but are not
limited to:

-Human behavior capture technology and benchmark datasets
-Social activity detection, tracking, reconstruction, and recognition
-Social scene representation and understanding
-Social behavior modeling and prediction
-Multimodal social signal integration
-Causality and reciprocity of social interaction
-Applications of social intelligence

The HBU is organized as a full-day, single track event with invited
talks, oral presentations and poster presentations.

Submissions must represent original material. Papers are accepted for
review with the understanding that the same work has been neither
submitted to, nor published in, another journal or conference. All
manuscripts will undergo a rigorous review process by the members of
the program committee.
You can submit a paper now at:

Invited Speakers:
Shai Avidan, Tel-Aviv University
Marco Cristani, University of Verona
David Forsyth, University of Illinois at Urbana-Champaign
Daniel Gatica-Perez, Indiap Research Institute
Fei-Fei Li, Stanford University
James Rehg, Georgia Institute of Technology
Nicu Sebe, University of Trento
Alessandro Vinciarelli, University of Glascow

Important Dates:
13 June: Submission of full papers (23:59pm PST)
4 July: Notification of acceptance
11 July: Camera-ready paper submissions
12 September: HBU Workshop

You may contact H.S. Park ( or A.A. Salah
( about questions regarding HBU.

Organizing Committee:
Hyun Soo Park, Carnegie Mellon University, USA
Albert Ali Salah, Bo?azi?i University, Turkey
Yong Jae Lee, University of California, Berkeley, USA
Louis-Philippe Morency, University of Southern California, USA
Yaser Sheikh, Carnegie Mellon University, USA
Rita Cucchiara, University of Modena and Reggio Emilia, Italy

(Tentative) Program Committee:
Hamid Aghajan, Stanford University, USA
Oya Aran, Idiap Research Institute, CH
Richard Bowden, University of Surrey, UK
Wongun Choi, NEC Laboratories America, USA
Peter Carr, Disney Research, USA
Marco Cristani, University of Verona, IT
Fernando de la Torre, Carnegie Mellon University, USA
Laurence Devillers, LIMSI, FR
Hamdi Dibeklioglu, Delft University of Technology, NL
P?nar Duygulu Sahin, Bilkent University, TR
Haz?m Ekenel, Istanbul Technical University, TR
Alireza Fathi, Stanford University, USA
Raquel Fernandez Rovira, University of Amsterdam, NL
David Forsyth, University of Illinois at Urbana Champaign, USA
Jordi Gonzalez, UAB-CVC Barcelona, ES
Hatice Gunes, Queen Mary University of London, UK
Alexander Hauptmann, Carnegie Mellon University, USA
Hayley Hung, Delft University of Technology, NL
Nazli Ikizler-Cinbis, Hacettepe University, TR
Quiang Ji, Ransellaer Polytechnic Institute, USA
Mohan Kankanhalli, National University of Singapore, SG
Cem Keskin, Microsoft Research, UK
Kris Kitani, Carnegie Mellon University, USA
Ivan Laptev, INRIA, FR
Patrick Lucey, Disney Research, USA
Simon Lucey, CSIRO, AU
Jean Marc Odobez, Idiap Research Institute, CH
Greg Mori, Simon Fraser University, CA
Vittorio Murino, Istituto Italiano di Tecnologia and University of Verona, IT
Massimo Piccardi, University of Technology, Sydney, AU
Shishir Shah, University of Houston, USA
Alan Smeaton, Dublin City University, IE
Leonid Sigal, Disney Research, USA
Khiet Truong, University of Twente, NL

Dr. Albert Ali Salah
Bogazici University, Computer Engineering Dept.
34342 Bebek  - Istanbul, Turkey
Phone: +90 212 359 (7774)
Bogazici University, Cognitive Science MA Program
General co-chair, 16th ACM Int. Conf. on Multimodal Interaction

Call for papers: Special Issue on Mental Model Ascription by Intelligent Agents

2nd Call for Papers

Interaction Studies: Special Issue on Mental Model Ascription by Intelligent Agents

Mental model ascription, otherwise known as “mindreading”, involves inferring features of another human or artificial agent that cannot be directly observed, such as that agent’s beliefs, plans, goals, intentions, personality traits, mental and emotional states, and knowledge about the world. This capability is an essential functionality of intelligent agents if they are to engage in sophisticated collaborations with people. The computational modeling of mindreading offers an excellent opportunity to explore the interactions of cognitive capabilities, such as high-level perception (including language understanding and vision), theory of mind, decision-making, inferencing, reasoning under uncertainty, plan recognition and memory management. Contributions are sought that will advance our understanding of mindreading, with priority being given to carefully described, algorithmic or implemented approaches that address the practical necessity of computing prerequisite inputs. Formal evaluations are not required.

This volume was inspired by successful workshops at CogSci 2012 (Modeling the Perception of Intentions) and CogSci 2013 (Mental Model Ascription by Language-Enabled Intelligent Agents).

Since Interaction Studies targets a broad audience, authors are encouraged to provide sufficient context for their contributions and define specialist terminology.

The deadline for submissions is January 14, 2014. Submission requirements and instructions can be found at Please address questions to the special edition editor, Marge McShane, at

Reflections on mirror neurons

There hasn’t been much research in neuroscience that’s directly relevant to intention perception, except for the finding of mirror neurons. These are very small bundles of neurons that are activated whenever performing certain actions or observing someone else performing the same actions. Because all models to date of intention processing make it appear to be a very computationally-intensive task, I’ve been skeptical that any small bundle of neurons could do it. And there’s reason to be skeptical, because any task that’s distributed across a region or regions of the brain would might show low activation, while any bottleneck in communicating the results of those computations might show as quite active.

A new review article in the journal Cell provides further evidence for skepticism.

I agree with Wired’s description: “These findings are significant because they show how mirror neurons are not merely activated by incoming sensory information, but also by formulations developed elsewhere in the brain about the meaning of what is being observed.”

Getting on Singapore’s Do Not Call registry for calls, sms’s, and faxes

“Consumers who receive telemarketing calls despite having listed their numbers on the registry can complain to a watchdog called the Personal Data Protection Commission (PDPC). They may register through the website at, or by text message by sending “DNC” to 78772 to block calls, text messages and fax messages; “DNC” to 78773 to block calls only; “DNC” to 78774 to block text messages only. They may also register by phone at 1800-248-0772 to block calls, text messages and fax messages; 1800-248-0773 to block calls only; or 1800-248-0774 to block text messages only.”

Telemarketers seem to be allowed 60 days to comply for a particular recipient number.

In with the old, onto the new

‘Just finished copying over lots of little tech notes I had posted on various old blogs and home pages from as long ago as 1998.  Whew!

I wanted to make all those tips findable by search engines, but I’m eager to start on real commentary about issues that excite me now, like commonsense reasoning in AI and parsing of natural language.