Multimodal human-robot interaction

IMG_0636

Description

To accomplish the Human-Robot Interaction problem we think that we have to resolve the following issues:

Meaning Generation


The robot has to be able to understand context, i.e. object and human detection and identification. The skill of giving meaning to the environment objects will make important progress in the robot interaction with them.

The essence of this problem is the formulation process: how to represent the meaning of something. It is a knowledge representation problem. But this problem has been treated for along human history. Philosophers, psychologists and other scientifics. A lot of different and interesitng approaches have emerged in the last years, but the application of these ideas to robotics is not a banal issue.

For interaction with the human, we are developing a model of the user that pretend to include: his mental models, emotions, beliefes, desires and intentions.

Human-Human Interaction


In this field we are going to use the subdivision established by Morris in “Foundations in the Theory of Signs”, that subdivide the Human Communication in three areas:


  1. Syntax. It studies the theory of information: codification, channels, capacity, noise, redundancy and other probabilistic language properties.

  2. Semantic. The Meaning is the central goal of the Semantic. In the Communication process emitter and receiver have to agree in the meaning of a message.

  3. Pragmatic. Pragmatic domain studies the effects of the communication in the behaviour of both the emitter and the receiver.


This brief schema established the frame of our Human Communication research. The model that we are developing will be implemented by means of the Automatic-Deliberated Architecture. This architecture, that has been created by Ramón Barber, is an hybrid control architecture. The above Syntax Level coincides with the low level architecture or Automatic Level. The Semantic Level would coincide with the high level of the architecture: Deliberated Level

Human-Robot Interaction


The mean goal of the above issues is to make a model that can be implemented in a computer system. We want to give to the user the sensation that is interacting with the Personal Robot, efficiently. We have two different ways to solve the problem: the inner approach and the outer approach. In the inner one, we are interested in developing a Human-Human Interaction model in the robot, and then adjust the model in the pragmatic level to make the interaction dynamic works correctly. In the outer approach, we are more interested in developed a model that satisfies the interaction dynamic directly. This model doesn’t have to be a human-based model.

By interaction dynamic we understand, the process along time where the robot is doing things to the user and is detecting things that the user does. This dynamic process has a special time parameters like for example silence time, time of a question, waiting times, etc, special movements like blinking, consent movements, etc, and special user gestures and movements detection like “user is speaking”, “user is very close”, etc.

Human-robot interaction is defined as the study of
humans, robots, and the ways they influence each other.
This interaction can be social if the robots are able to interact
with human as partners if not peers. In this case, there is a
need to provide humans and robots with models of each
other. Sheridan argues that the ideal would be analogous to
two people who know each other well and who can pick up
subtle cues from one another (e.g., musician playing a duet).

A social robot has attitudes or behaviours that take the
interests, intentions or needs of the humans into account.
Bartneck and Forlizzi define a social robot as “an
autonomous or semiautonomous robot that interacts and
communicates with humans by following the behavioral
norms expected by the people with whom the robot is
intended to interact?. The term sociable robot has been
coined by Breazeal in order to distinguish an
anthropomorphic style of human-robot interaction from
insect-inspired interaction behaviours. In this context, sociable
robots can be considered as a distinct subclass of social
robots. She defines sociable robots as socially participative
creatures with their own internal goals and motivations.

Multimodality


Multimodality allows humans to move seamlessly between
different modes of interaction, from visual to voice to touch,
according to changes in context or user preference. A social
robot must provide multimodal interfaces, which try to
integrate speech, written text, body language, gestures, eye
or lip movements and other forms of communication in order
to better understand the human and to communicate more
effectively and naturally.

We can enumerate the different modalities in HRI in two types: perception or expression modes. The different modes work in a separate way, that is, they do not communicate each other directly. To make a global synchronization between them an upper entity is used, that is called Communication Act Skill. Our multimodality model for robot interaction is based on these modes:


  • Visual: gesture expression and recognition.

  • Tactile: tactile sensor and tactile screen perception.

  • Voice: text-to-speech and automatic-speech-recognition.

  • Audiovisual: sound and visual expression

  • Remote: web-2.0 interaction.

Visual Interactive Mode: Gesture Expression Model


The Visual Mode includes all visible expressive acts. Traditionally, it is divided in kinesics: body gestures, and proxemics: body placing in the communication system. We differentiate as a special interactive mode, the audiovisual mode, that is explained later. It has been established the importance of body movements in the communication act because it contains a lot of information that flows very quickly. Birdwhistell argues that the 65% of the information in a human-human interaction is non-verbal. Visual gestures shows human thoughts, mood state, replaies, complements, accents and adjust verbal information. Several problems arise when we want to make a human gestures model that could be implemented in a robot. We differ two directions: gesture expression model and gesture recognition. At the moment only the former is being taken into account.

A discrete set of different atomic gestures has been implemented. An atomic-gesture duration is lower than approximately five seconds. Each atomic-gesture can be interrupted in real-time by another atomic-gesture to configure the final dynamic expression.
Attending to the whole life of a gesture, they are divided in acquired and non-acquired or innate gestures. So, when the robot begins to be active it counts with a set of non-acquired gestures that can be or not be kept along its life. But the robot also can learn more gestures from the user.
Attending to the gesture dynamics, we differ gestures that have or not have a final ending, and also gestures that should or should not start from a necessary initial position.
Each atom-gesture has an intensity and velocity parameter that modulate it.
Attending to the way that each gesture can be interpreted we consider:


  • Emblems: that replace words and sentences.
  • Instructs: that reinforce verbal messages.
  • Affective gestures: that show emotions and express affect.
  • Adjusting or control gestures: that regulate the flux and way of communication. They are one of the more culturally determined gestures.
  • Adaptors: release emotional and physical tension. They are in the low level awareness.

Tactile Mode


Two different kind of tactile modes can be differentiate: tactile skin sensing, and tactile screen sensing. The former is analogue to human skin sensing. The latter is exclusive for robotics. Depending on the hardware the robot can detect that something is touching it, where and get information about the force. The tactile screen gives the robot the possibility to perceive ink-gesture data introduced by the user. As the tactile screen is also showing an image, the ink-gesture data has to be interpreted in contrast with that image. The ability of showing an image by means of a tactile screen is explained latter in the audiovisual interactive mode.

Voice Mode


This mode is in charge of verbal human-robot communication.

Verbal Perception.


The verbal signal can be interpreted for speech recognition, but it also gives user prosodic and user localization information.
Our automatic speech recognition model is based on a dynamic asr-grammars system. It works in a asr-engine. The set of active grammars can be changed in real-time. The set of asr-grammars is made a-priori attending of what information is useful for the robot. Each grammar is related to a Speech Act, so the speech recognition works as a speech act trigger.
No ontological information is consider, at the moment.

Verbal Expression.


The speech system is based on two types of sentences: fixed sentences and variable sentences. The former is designed a-priori, and they are sentence related to constant episodes that always occur in a common conversation.
The variable sentence are made using a fixed grammar. When the speech skill decides to use a variable sentence, it first chooses a grammar with slots. Then, the grammar holes are completed using the appropriate words for the context.

Audiovisual Mode


A personal robot incorporates and works by means one or more computers. So the range of possible communication ways can be extended from human communication emulation to other possibilities that a computer offers, for example electronic sound synthesis. The sound mode can be used in:

  • Mood, affect or emotion expression associated with long term states: happy/sad or angry/calm

  • Interjection expression associated with short term states: fright, scare, laughter, crying, etc.
  • Notice sounds, to get the user attention and notice some interaction prompts, etc.
  • Singing skill using synthesized instruments.
  • Sound imitation: siren sounds, dog barks and other nature sounds, …

We are studying the sound in music in its communicational side, extracting a set of parameters for sound synthesis and the relation of these parameters with the kind of message or intention that the robot wants to communicate. We are implementing a Sound Synthesis System that takes internal robot state parameters as inputs and synthesizes sounds for expression. This internal parameters include but not limited to emotional state, emotional magnitude or mood energy.

Audiovisual mode refers to the expression of synchronized images, video and sound, music, or voice. Moreover, audiovisual expressions (sound, video and computer generated graphics) can also be triggered and provided to the user as feedback or respond to robots initiatives.

Remote Mode


This is the most robot specific interactive mode. As the core part of a robot is its computer, the robot is also able to use all the capabilities that the computer offers. And one of the most important thing that a computer can do is to connect to internet and access to remote information. In the other side, internet is growing so much, that net protocols are changing to more computer centered protocols. In this sense web-2.0 or so called semantic-web offers inter-computer communication as never has existed. In this way, internet works as a big sensor for the robot, that can access to weather reports, news, e-mail, bus timetables, etc. The robot can also receive remote orders from a remote user, and interact with a remote user using chat skill, video-conference, etc.

Important HRI researching groups


HRI events


Entries:
Signage system for the navigation of autonomous robots in indoor environments
IEEE Transactions on Industrial Informatics. num. 1 , vol. 10 , pages: 680 – 688 , 2014
A. Corrales M. Malfaz M.A. Salichs
Symbolic Place Recognition in Voronoi-based maps by Using Hidden Markov Models
Journal of Intelligent and Robotic Systems. , vol. 39 , pages: 173 – 197 , 2004
L. Moreno D. Blanco
Navigation of Mobile Robots: Open Questions
Robotica. num. 3 , vol. 18 , pages: 227 – 234 , 2000
L. Moreno M.A. Salichs

Entries:
An Android Interface for an Arduino Based Robot for Teaching in Robotics
6th International Conference of Education, Research and Innovation , 2013, Sevilla, Spain
J. Crespo R. Barber
Extended range guidance system for micro-tunnelling machine
International Symposium for Automation and Robotics in Construction 2012 (ISARC/Gerontechnology 2012). Vol. 11. Num. 2, 2012, Eindhoven, The Netherlands
A. Jardon S. Martinez Juan G. Victores
Use of RFID technology on a mobile robot fortopological navigation tasks
IEEE International Conference on RFID-Technologies and Applications, 2011, Sitges, Spain
A. Corrales M.A. Salichs
Autonomous Monitoring And Reaction To Failures In A Topological Navigation System
2nd International Conference on Informatics in Control, Automation and Robotics, 2005, Barcelona, Spain
V. Egido R. Barber M.A. Salichs
A Door Lintel Locator Sensor for Mobile Robot Topological Navigation
IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2005, Sofia, Bulgaria
V. Egido R. Barber M.A. Salichs
A Planner For Topological Navigation Based On Previous Experiences
The 5th IFAC Symposium on Intelligent Autonomous Vehicles, 2004, Lisboa, Portugal
V. Egido R. Barber M.A. Salichs
Sistema de Interacción Remota con Robots Móviles basado en Internet I
I Jornadas de Trabajo: Educación en Automática. DocenWeb: Red Temática de Docencia en Control mediante Web, 2004, Alicante, Spain
A.M. Khamis R. Barber M.A. Salichs
Using learned visual landmarks for intelligent topological navigation of mobile robots
IEEE International Conference on Robotics and Automation, Taipei, Taiwan
M.A. Salichs
Corridor exploration in the EDN Navigation System
15th IFAC World Congress on Automatic Control, 2002, Barcelona, Spain
V. Egido R. Barber M.A. Salichs
Learning Visual Landmarks for Mobile Robot Navigation
15th IFAC World Congress on Automatic Control. Barcelona, Barcelona, Spain
M.A. Salichs
Self-Generation by a Mobile Robot of Topological Maps of Corridors
IEEE International Conference on Robotics and Automation, 2002, Washington, USA
V. Egido R. Barber M.A. Salichs
Mobile Robot Navigation Based on Event Maps
3rd International Conference on Field and Service Robotics, 2001, Helsinki, Filand
R. Barber M.A. Salichs
Mobile Robot Navigation Based on Visual Landmark Recognition
IFAC Symposium on Intelligent Autonomous Vehicles, 2002, Sapporo, Japan
M.A. Salichs
Algorithm of Topological Map Generation for the EDN Navigation System
IFAC Workshop on Mobile Robot Technology, 2001, Jejudo Island, Korea
V. Egido R. Barber M.A. Salichs
A Visual Landmark Recognition System for Topological Navigation of Mobile Robots
IEEE International Conference on Robotics and Automation, 2001, Seoul, Korea
M.A. Salichs
Navigation of Mobile Robots: Learning from Human Beings
Plenary Session. IFAC Workshop on Mobile Robot Tecnology, Jejudo Island, Korea
M.A. Salichs
An inferring semantic system based on relational models for mobile robotics
2015 IEEE International Conference on Autonomous Robot Systems and Competitions, 2015, Vila Real, Portugal
J. Crespo R. Barber O. M. Mozos
Detecting Objects for Indoor Monitoring and Surveillance for Mobile Robots
IEEE 2014 International Conference on Emerging Security Technologies, 2014, Alcalá de Henares, Spain
J. Crespo R. Barber C. Astua
A ROS-BASED MIDDLE-COST ROBOTIC PLATFORM WITH HIGH-PERFORMANCE
ICERI2015, The 8th annual International Conference of Education, Research and Innovation , 2015, Sevilla, Spain.
C. Gómez A. C. Hernández J. Crespo R. Barber
Object Classification in Natural Environments for Mobile Robot Navigation
IEEE, International Conference on Autonomous Robot Systems and Competitions (ICARSC), 16th edition, 2016, Braganza, Portugal
A. C. Hernández C. Gómez J. Crespo R. Barber
Integration of Multiple Events in a Topological Autonomous Navigation System
IEEE, International Conference on Autonomous Robot Systems and Competitions (ICARSC), 16th edition, 2016, Bragança, Portugal
C. Gómez A. C. Hernández J. Crespo R. Barber

Entries:
Robots Sociales
chapter: Modelado semántico del entorno en robótica cognitiva. Aplicación en navegación. pages: 145 – 166. Universidad Carlos III de Madrid , ISBN: 978-84-695-7212, 2013
J. Crespo R. Barber
The Industrial Electronics Handbook. Control and Mechatronics
chapter: 39. Mobile Robots pages: 1 – 13. CRC Press , ISBN: 978-1-4398-0287, 2011
M. Malfaz R. Barber M.A. Salichs
Progress in Robotics.
chapter: Integration of a RFID System in a Social Robot. pages: 66 – 73. Springer Berlin Heidelberg , ISBN: 978-3-642-03986, 1999
A. Corrales M.A. Salichs
RoboCity16 Open Conference on Future Trends in Robotics
chapter: Object Perception applied to Daily Life Environments for Mobile Robot Navigation pages: 105 – 112. Consejo Superior de Investigaciones Científicas Madrid, España , ISBN: 978-84-608-8452-1, 2016
A. C. Hernández C. Gómez J. Crespo R. Barber
RoboCity16 Open Conference on Future Trends in Robotics
chapter: A Topological Navigation System based on Multiple Events for Usual Human Environments Consejo Superior de Investigaciones Científicas Madrid, España , ISBN: 978-84-608-8452-1, 2016
C. Gómez A. C. Hernández J. Crespo R. Barber