Software architectures
Updated: 19 December 2014 - 12:03pm by M.A. Salichs
The new bio-inspired Automatic-Deliberative (AD) architecture is endowed with a decision making system based on biologically inspired concepts such as drives, motivation, emotions, and self-learning, improving the autonomy and the livingbeing behaviours.

The Motivated Automatic-Deliberative architecture

This biologically inspired architecture is based on the ideas of the modern psychology expressed by Shiffrin and Schneider, so it considers two levels, the automatic and the deliberative levels. In AD architecture, both levels are formed by skills, which endow the robot with different sensory and motor capacities, and process information.

  • Deliberative level In the natural world, humans deliberative activities are characterized by the fact that these are carried out in a conscious form. Moreover, temporal dimension is an important property: deliberative processes require a large quantity of time to be dedicated to the analysis. These activities are carried out sequentially, that is, one after another, and it is not possible to carry out more than one deliberative activity at a time.
    In our AD architecture implementation, deliberative skills are based on these activities and the authors consider that only one deliberative skill can be activated at once.
  • Automatic level Living beings’ automatic activities are characterized by the fact that their actions and perceptions are carried out without the necessity of having consciousness of the processes responsible for controlling those activities. Examples of this would be the heart beat, the hand movement when writing, or that of legs when walking. An automatic activity can be carried out in parallel with other automatic activities and with a deliberative activity. For example, a person can be driving a vehicle and maintaining a conversation simultaneously. The level of complexity of automatic activities may be very variable and goes from the "simplicity" of moving a finger to the complexity of playing a sonata previously memorized on the piano.
    In the AD implementation, the automatic level is mainly formed by skills which are related with sensors an actuators. Automatic skills can be performed in a parallel way and they can be merged in order to achieve more complex skills.
  • AD Memories One of the main characteristics of human beings is their ability to acquire and store information from the world and from their own experiences.Memory can be defined as the capacity to recall past experience or information in the present. Based on the memory model proposed by Atkinson and Shiffrin, the AD architecture considers two different memories: the Short-Term Memory and the Long-Term Memory. In our architecture, Short-Term Memory is defined as a temporary memory. This memory is regarded as a working memory where temporal information is shared among processes and skills. On the other hand, Long-Term Memory is a permanent repository of durable knowledge. This knowledge can come from learning, from processing the information stored in Short-Term Memory, or it can be given a priori. In AD architecture this memory refers to a permanent memory where stable information is available only for deliberative skills.
  • The automatic level is linked to modules that communicate with hardware, sensors, and motors. At the deliberative level, reasoning processes are placed. The communication between both levels is bidirectional and it is carried out by the Short-Term Memory and events.

    Events are the mechanisms used by the architecture for working in a cooperative way. An event is an asynchronous signal for coordinating processes by being emitted and captured. The design is accomplished by the implementation of the publisher/subscriber design pattern so that an element that generates events does not know whether these events are received and processed by others or not.

    The Short-Term Memory is a memory area which can be accessed by different processes, where the most important data is stored. Different data types can be distributed and are available to all elements of the AD architecture. The current and the previous value, as well as the date of the data capture, are stored. Therefore, when writing new data, the previous data is not eliminated, it is stored as a previous version. The Short-Term Memory allows to register and to eliminate data structures, reading and writing particular data, and several skills can share the same data. It is based on the blackboard pattern.

    On the other hand, the Long-Term memory has been implemented as a data base and files which contain information such as data about the world, the skills, and grammars for the automatic speech recognition module.

    The essential component in the AD architecture is the skill and it is located in both levels. In terms of software engineering, a skill is a class that hide data and processes that describes the global behavior of a robot task or action. The core of a skill is the control loop which could be running (skill is activated) or not (skill is blocked). Skills can be activated by other skills, by a sequencer, or by the decision making system. They can give data or events back to the activating element or other skills interested in them. Skills are characterized by:
    • They have three states: ready (just instantiated), activated (running the control loop), and locked (not running the control loop).
    • Three working modes: continuous, periodic, and by events.
    • Each skill is a process. Communication among processes is achieved by Short-Term Memory and events.
    • A skill represents one or more tasks or a combination of several skills.
    • Each skill has to be subscribed at least to an event and it has to define its behavior when the event arises.

    The AD architecture allows the generation of complex skills from atomic skills (indivisible skills). Moreover, a skill can be used by different complex skills, and this allows the definition of a flexible architecture.

    The decision making system

    In bio-inspired systems, the fact that it is the proper agent/robot who must decide its own objectives it is assumed. Therefore, since this is our objective, a decision making system based on drives, motivations, emotions, and selflearning is required.

    The decision making system has a bidirectional communication with the AD architecture. On one side, the decision making system will select the behavior the robot must execute according to its state. This behavior will be taken by the AD architecture activating the corresponding skill/s (deliberative or automatic one). On the other side, the decision making system needs information in order to update the internal and external state of the robot.

    Drives and Motivations

    The term homeostasis means maintaining a stable internal state. This internal state can be configured by several variables, which must be at an ideal level. When the value of these variables differs from the ideal one, an error signal occurs: the drive.

    In our approach, the autonomous robot has certain needs (drives) and motivations, and following the ideas of Hull and Balkenius, the intensities of the motivations of the robot are modeled as a function of its drives and some external stimuli. For this purpose we used Lorentz’s hydraulic model of motivation. In Lorenz’s model, the internal drive strength interacts with the external stimulus strength. If the drive is low, then a strong stimulus is needed to trigger a motivated behavior. If the drive is high, then a mild stimulus is sufficient. The general idea is that we are motivated to eat when we are hungry and also when we have food in front of us, although we do not really need it.

    In our approach, once the intensity of each motivation is calculated, they compete among themselves for being the dominant one, and this one determines the inner state of the robot.

    In this decision making system, there are no motivational behaviors. This means that the robot does not necessary know in advance which behaviors to select in order to satisfy the drive related to the dominant motivation. There is a repertory of behaviors and they can be executed depending on the relation of the robot with its environment, i.e. the external state. For example, the robot will be able to interact with people as long as it is accompanied by someone.

    Learning

    The objective of this decision making system is having the robot learn how to behave in order to maintain its needs within an acceptable range. For this purpose, the learning process is made using a well-known reinforcement learning algorithm, Q-learning, to learn from its bad and good experiences.

    By using this algorithm, the robot learns the value of every state-action pair through its interaction with the environment. This means, it learns the value that every action has in every possible state. The highest value indicates that the correspondent action is the best one to be selected in that state.

    At the beginning of the learning process these values, called the q-values, can all be set to zero, or some of them can be fixed to another value. In the first case, this implies that the robot will learn from scratch, and in the second, that the robot has some kind of previous information about the behavior selection. These initial values will be updated during the learning process.

    Emotions

    Besides, happiness and sadness are used in the learning process as the reinforcement function and they are related to the wellbeing of the robot. The wellbeing of the robot is defined as a function of its drives and it measures the degree of satisfaction of its internal needs: as the values of the needs of the robot increase, its wellbeing decreases.

    In order to define happiness and sadness, we took the definition of emotion given by Ortony into account. In his opinion, emotions occur due to an appraised reaction (positive or negative) to events. According to this point of view, Ortony proposes that happiness occurs because something good happens to the agent. On the contrary, sadness appears when something bad happens. In our system, this can be translated into the fact that happiness and sadness are related to the positive and negative variations of the wellbeing of the robot.

    On the other hand, the role of happiness and sadness as the reinforcement function was inspired by Gadanho’s works, but also by Rolls. He proposes that emotions are states elicited by reinforcements (rewards or punishments), so our actions are oriented to obtaining rewards and avoiding punishments. Following this point of view, in this proposed decision making system, happiness and sadness are used as the positive and negative reinforcement functions during the learning process, respectively. Moreover, this approach seems consistent with the drive reduction theory where the drive reduction is the chief mechanism of reward.

    Journal Publications

    Conference Publications

    Patents

    Books

    Doctoral Thesis