Decision Making System for Autonomous Social Agents Based on Emotions and Self-learning
Sobresaliente "Cum Laudem"

The objective of this thesis is to develop a decision making system for an autonomous and social robot. This system is composed by several subsystems: a motivational system, a drives system and an evaluation and behaviour selection system. All systems are based on motivations, drives and emotions. These concepts are described in detail in this thesis.
Due to the difficulties of working with a real robot, it has been decided to implement this decision making system on virtual agents as a previous step. These agents live in a virtual world which has been built using a text based MUD (Multi-User Domain). In this world the agents can interact with each other, allowing the social interaction, and with the other objects present in the world. The reason why this text based game was selected, instead of a modern one with graphic interfaces, is that the interpretation of the information is much simpler.
The selection of behaviours is learned by the agent using reinforcement learning algorithms. When the agent is not interacting with other agents he uses the Q-learning algorithm. When social interaction exists, the rewards the agent receives depend not only on his own actions, but also on the action of the other agent. In this case, the agent uses multi-agent learning algorithms, also based on Q-learning.
The inner state of the agent is considered as the dominant motivation. This fact causes that the system is not completely markovian and therefore, no so easy to work with. In order to simplify the learning process, the states related to the objects are considered as independent among them. The state of the agent is a combination between his inner state and his state in relation with the rest of agents and objects. The fact that the states in relation to the objects are considered as independent, causes that the possible relation between objects is ignored. In fact, the actions related to an object, may affect the state in relation to the other objects, causing “collateral effects”. In this thesis, a new variation of Q-learning is proposed to consider these effects.
This system uses happiness and sadness as positive and negative reinforcement respectively. Therefore, behaviours are not going to be selected to satisfy the goals determined by the motivations of the agent, but to reach happiness and avoid sadness. The appraisal emotions theories state that emotions are the result of evaluation processes and therefore they are subjective. Based on those theories, it is going to be considered, in this decision making system, that certain emotions are going to be generated from the evaluation of the wellbeing of the agent. The wellbeing measures how much the needs of the agent are satisfied. Happiness is produced because something good has happened, i.e. an increment of the wellbeing is produced. On the contrary, sadness is produced because something bad has happened, so the wellbeing decreases.
Finally, another emotion is introduced: Fear. Fear is presented from two points of view: to be afraid of executing risky actions, or to be afraid of being in a dangerous state. In this last case, fear is considered as a motivation, in accordance with other emotions theories.

Universidad Carlos III de Madrid