The Motivated Automatic-Deliberative architecture
In our AD architecture implementation, deliberative skills are based on these activities and the authors consider that only one deliberative skill can be activated at once.
In the AD implementation, the automatic level is mainly formed by skills which are related with sensors an actuators. Automatic skills can be performed in a parallel way and they can be merged in order to achieve more complex skills.
Events are the mechanisms used by the architecture for working in a cooperative way. An event is an asynchronous signal for coordinating processes by being emitted and captured. The design is accomplished by the implementation of the publisher/subscriber design pattern so that an element that generates events does not know whether these events are received and processed by others or not.
The Short-Term Memory is a memory area which can be accessed by different processes, where the most important data is stored. Different data types can be distributed and are available to all elements of the AD architecture. The current and the previous value, as well as the date of the data capture, are stored. Therefore, when writing new data, the previous data is not eliminated, it is stored as a previous version. The Short-Term Memory allows to register and to eliminate data structures, reading and writing particular data, and several skills can share the same data. It is based on the blackboard pattern.
On the other hand, the Long-Term memory has been implemented as a data base and files which contain information such as data about the world, the skills, and grammars for the automatic speech recognition module. The essential component in the AD architecture is the skill and it is located in both levels. In terms of software engineering, a skill is a class that hide data and processes that describes the global behavior of a robot task or action. The core of a skill is the control loop which could be running (skill is activated) or not (skill is blocked). Skills can be activated by other skills, by a sequencer, or by the decision making system. They can give data or events back to the activating element or other skills interested in them. Skills are characterized by:
- They have three states: ready (just instantiated), activated (running the control loop), and locked (not running the control loop).
- Three working modes: continuous, periodic, and by events.
- Each skill is a process. Communication among processes is achieved by Short-Term Memory and events.
- A skill represents one or more tasks or a combination of several skills.
- Each skill has to be subscribed at least to an event and it has to define its behavior when the event arises.
The AD architecture allows the generation of complex skills from atomic skills (indivisible skills). Moreover, a skill can be used by different complex skills, and this allows the definition of a flexible architecture.
The decision making system
The decision making system has a bidirectional communication with the AD architecture. On one side, the decision making system will select the behavior the robot must execute according to its state. This behavior will be taken by the AD architecture activating the corresponding skill/s (deliberative or automatic one). On the other side, the decision making system needs information in order to update the internal and external state of the robot.
Drives and Motivations
The term homeostasis means maintaining a stable internal state. This internal state can be configured by several variables, which must be at an ideal level. When the value of these variables differs from the ideal one, an error signal occurs: the drive.In our approach, the autonomous robot has certain needs (drives) and motivations, and following the ideas of Hull and Balkenius, the intensities of the motivations of the robot are modeled as a function of its drives and some external stimuli. For this purpose we used Lorentz’s hydraulic model of motivation. In Lorenz’s model, the internal drive strength interacts with the external stimulus strength. If the drive is low, then a strong stimulus is needed to trigger a motivated behavior. If the drive is high, then a mild stimulus is sufficient. The general idea is that we are motivated to eat when we are hungry and also when we have food in front of us, although we do not really need it.
In our approach, once the intensity of each motivation is calculated, they compete among themselves for being the dominant one, and this one determines the inner state of the robot.
In this decision making system, there are no motivational behaviors. This means that the robot does not necessary know in advance which behaviors to select in order to satisfy the drive related to the dominant motivation. There is a repertory of behaviors and they can be executed depending on the relation of the robot with its environment, i.e. the external state. For example, the robot will be able to interact with people as long as it is accompanied by someone.
Learning
The objective of this decision making system is having the robot learn how to behave in order to maintain its needs within an acceptable range. For this purpose, the learning process is made using a well-known reinforcement learning algorithm, Q-learning, to learn from its bad and good experiences.By using this algorithm, the robot learns the value of every state-action pair through its interaction with the environment. This means, it learns the value that every action has in every possible state. The highest value indicates that the correspondent action is the best one to be selected in that state.
At the beginning of the learning process these values, called the q-values, can all be set to zero, or some of them can be fixed to another value. In the first case, this implies that the robot will learn from scratch, and in the second, that the robot has some kind of previous information about the behavior selection. These initial values will be updated during the learning process.
Emotions
Besides, happiness and sadness are used in the learning process as the reinforcement function and they are related to the wellbeing of the robot. The wellbeing of the robot is defined as a function of its drives and it measures the degree of satisfaction of its internal needs: as the values of the needs of the robot increase, its wellbeing decreases.In order to define happiness and sadness, we took the definition of emotion given by Ortony into account. In his opinion, emotions occur due to an appraised reaction (positive or negative) to events. According to this point of view, Ortony proposes that happiness occurs because something good happens to the agent. On the contrary, sadness appears when something bad happens. In our system, this can be translated into the fact that happiness and sadness are related to the positive and negative variations of the wellbeing of the robot.
On the other hand, the role of happiness and sadness as the reinforcement function was inspired by Gadanho’s works, but also by Rolls. He proposes that emotions are states elicited by reinforcements (rewards or punishments), so our actions are oriented to obtaining rewards and avoiding punishments. Following this point of view, in this proposed decision making system, happiness and sadness are used as the positive and negative reinforcement functions during the learning process, respectively. Moreover, this approach seems consistent with the drive reduction theory where the drive reduction is the chief mechanism of reward.