Nowadays, engagement detection plays an essential role in e-learning
education and robotics. In the field of human-agent interaction, it is
of great interest to know the attitude of the human peer towards the
interaction so that the agent can react accordingly. The goal of this
paper is to develop an automatic real-time engagement recognition system
using a combination of non-verbal features (gaze direction, head
position, facial expression and distance between users) extracted using
computer vision techniques. Our system uses a machine learning model
based on Random Forest and achieves 86% accuracy, improving the results
of the state-of-the-art methods by 22.2% in engagement level detection
accuracy on the Daisee dataset. Furthermore, using an RGB camera, the
system can detect the level of user engagement in real-time and classify
it into four levels of intensity.