End user programming of social robots
Updated: 19 December 2014 - 11:59am by M.A. Salichs
One of the main challenges faced by social robots is how to provide intuitive, natural and enjoyable usability for the End-User. In the human normal environment, social robots could be important tools for education and entertainment(edutainment), in a var
Our approach tries to involve general scenarios for the end-user, who is not an expert programming person. Therefore the task is not fixed a priori, but chosen by the end-user, her/himself. The challenge we want to face up to is to make all robot skills accessible to the end-user so he/she could have fun using and combining them in new skills increasing robot capabilities in an easier way by natural interaction with the robot.

The robot can guide the user in that construction showing the possibilities in each interaction context and by precise queries about necessary information, maintaining the coherence of the interaction process. Therefore, the implemented system doesn’t just allow the natural programming of a social robot, but also helps to improve the naturalness in Human-Robot Interaction.

A Sequence Verbal Edition System (SVE) bridge between external utterances of the user and internal skills of the robot, making these skills accessible by voice. This system allows the user to build new skills (new sequences) from primitive skills, and reuse the created skills for subsequent sequences.

Interactive Sequence Edition

Semantics and Verbal Commands for the Edition of a Sequence

A sequence is essentially a graphical structure. The main goal of this work implies to make the sequence verbally accessible. Therefore we have studied how to design such a system that translates form verbal utterances to graphical information.

Referential Commands

For establishing the focus of attention for the sequence edition, the user can perform the following referential commands:
  1. Reference to a Node, that is in some part of the already created sequence. E. g. “After raising the left arm...”

  2. Reference to a Branch, that allows to distinguish between different branches inside the sequence. E.g. “In the first branch...”.

  3. Reference to a Structure of Multiples Branches This type of reference is used to conclude a part of the sequence made by some branches in parallel, that could be a conditional selection or a parallel execution. E. g. “You finish all the branches, and then, wait until I touch your left shoulder”.

If the user does not specify explicitly a focus of attention, the Sequence Generation System takes one by default. For instance, if the user adds an action, it is supposed that the next structural command is going to be in reference to that action, then the SGS establish by default the focus of attention in that new added action.

Structural Commands

In the implemented system, end-users can perform the following structural commands:
  1. Addition of a node. The node could be a step (action) or a transition (condition). E. g.: The utterance “Raise the left arm” will add an action. The utterance “then...” or “Wait until I touch your head” will add a condition.

  2. Removing of a node. All nodes (steps or transitions) in the edition focus will be removed. E. g.: “Remove the last action.”

  3. Opening or closing a structure of multiple branches that can be a sequence selection or simultaneous sequences, explained above. E. g.: “You consider four possibilities” will open a selection between 4 branches; “You do three things, at once” will open 3 simultaneous branches.

  4. Loops. That is, the union of one part of the sequence with a previous one. E. g.: “go back to the beginning”.

Notice that the addition of a node involves the definition of the content of such node, that is, if it is an action, its parameters, etc.

The Natural Programming System presented here allows multiple functionalities that can be easily implemented. One open question of a programming system is how to add new functions to the initial set of actions and conditions without changing the design of the system, that is, at execution time. In our approach, as semantic grammar can be changed at run-time, it is possible to add new terms, e.g. new action names, to the vocabulary of the robot. A detection capability of the spelling can also help in these sense. In the other side, as actions and conditions are implemented as a xml-sequence with Python functions, the new created function could be expressed in these terms.

New skills are created from primitive ones. But each new created skill can be reused for subsequent sequence editions. However, the innate primitive set of actions/- conditions plays an important role: this set has to cover all the possibilities of the robot. Here we have used just a small subset of the possibilities of Maggie: the low level movements of the different DOF and touch sensors. But also Maggie has many other skills that can be also used in a new sequence. For instance, verbal skills such as ettsSkill and asrSkill could be used as follows: “when I say you ”stop“ you stop.”, or “If I touch you in the side, you begin to laugh.” Many conditions can be also included in the set of primitives. For instance, “wait 5 seconds”, “when the battery level is below 50%...”, “If you detect an obstacle in the Laser...”, etc.

In other natural programming systems, the action primitives are one by one designed from a set of cue user utterances, therefore the primitives are well adapted to the user speech. This method has the advantage that easily covers a great range of the user utterances, if the application domain is well constrained. Nevertheless, this design method of the “innate set” does not assure that the primitives are going to cover all the action possibilities of the robot. In our approach it was very important to cover all such possibilities, and Maggie has many of them. Therefore, the innate primitive set is closer to the low-level robot action-perception space.

In the speech or symbolic level, the semantic-CFG rules perform the necessary transformation from the user utterance, that is natural, into the action-perception robot space, that follows a formal model. It is the DMS that is in charge of guiding the user and translate his/her natural language to an internal representation, in terms of these primitives.

Experimental tests with end-users show that they enjoy a lot playing with Maggie to the game of “teaching how to do things”. The robot still is not able to adapt itself to the complexity of user’s communication rhythms, and is the user who has to learn how and when articulate the utterances. Nevertheless, we have found that end-users perceive this adaptation intuitively and easily, and also that they enjoy doing so. End-users see Maggie as a baby-agent that still is in development, what produces a kind of sense of superiority.

The system easily strikes end-users in two ways. In the interaction side, by the naturalness of the interaction. In the execution side, as the robot performs what the user has told.

Journal Publications

Conference Publications



Doctoral Thesis