Interpreting users messages, both verbal and non-verbal is essential to achieve a natural Human-Robot Interaction. Traditionally, Social Robots, and particularly Care Robots, rely on interfaces such as voice, touch or images to acquire information from users although the latter is usually used to locate them. This manuscript present the main steps of machine learning-based approach, from the skeleton extraction to the features computation and the classification necessary to detect dynamic gestures, which provide more information than simple poses. 123 classification algorithms have been employed to analyse the performance and accuracy of the system. To train these classifiers a, 30 users were recording while performing 14 dynamic gestures, obtaining 1355 instances of 900 features for each of these. Results indicate that Random Forest classifier achieves the highest using cross-validation.