In this paper an autonomous social robot is living in a laboratory where it can interact with several items (people included). Its goal is to learn by itself the proper behaviors in order to maintain its wellbeing as high as possible. Several experiments have been conducted to test the performance of the system.
The Object Q-Learning algorithm has been implemented in the robot as the learning algorithm. This algorithm is a variation of the traditional Q-Learning since it considers a reduced state space and collateral effects. The comparison of the performance of both algorithms is shown in the first part of the experiments. Moreover, two mechanisms
intended to reduce the learning session durations have been included: Well-Balanced Exploration and Amplified Reward. Their advantages are justified in the results obtained in the second part of the experiments.
Finally, the behaviors learned by our robot are analyzed. The resulting behaviors have not been pre-programmed. In fact, they have been learned by real interaction in the real world, and are related to the motivations of the robot. These are natural behaviors in the sense that they can be easily understood by humans observing the robot.