Multimodal Fusion as Communicative Acts during Human-Robot Interaction
Cybernetics and Systems: An International Journal
2013


Research on dialog systems is a very active area in social robotics. During the last two decades, these systems have evolved from those based only on speech recognition and synthesis to the current and modern systems, which include new components and multimodality. By multimodal dialogue we mean the interchange of information among several interlocutors, not just using their voice as the mean of transmission but also all the available channels such as gestures, facial expressions, touch, sounds, etc. These channels add information to the message to be transmitted in every dialogue turn. The dialogue manager (IDiM) is one of the components of the robotic dialog system (RDS) and is in charge of managing the dialogue flow during the conversational turns. In order to do that, it is necessary to coherently treat the inputs and outputs of information that flow by different communication channels: audio, vision, radio frequency, touch, etc. In our approach, this multichannel input of information is temporarily fused into communicative acts (CAs). Each CA groups the information that flows through the different input channels into the same pack, transmitting a unique message or global idea. Therefore, this temporary fusion of information allows the IDiM to abstract from the channels used during the interaction, focusing only on the message, not on the way it is transmitted. This article presents the whole RDS and the description of how the multimodal fusion of information is made as CAs. Finally, several scenarios where the multimodal dialogue is used are presented.