There are many situations in our daily life where touch gestures during natural human–human interaction take place: meeting people (shaking hands), personal relationships (caresses), moments of celebration or sadness (hugs), etc. Considering that robots are expected to form part of our daily life in the future, they should be endowed with the capacity of recognising these touch gestures and the part of its body that has been touched since the gesture’s meaning may differ. Therefore, this work presents a learning system for both purposes: detect and recognise the type of touch gesture (stroke, tickle, tap and slap) and its localisation. The interpretation of the meaning of the gesture is out of the scope of this paper. Different technologies have been applied to perceive touch by a social robot, commonly using a large number of sensors. Instead, our approach uses 3 contact microphones installed inside some parts of the robot. The audio signals generated when the user touches the robot are sensed by the contact microphones and processed using Machine Learning techniques. We acquired information from sensors installed in two social robots, Maggie and Mini (both developed by the RoboticsLab at the Carlos III University of Madrid), and a real-time version of the whole system has been deployed in the robot Mini. The system allows the robot to sense if it has been touched or not, to recognise the kind of touch gesture, and its approximate location. The main advantage of using contact microphones as touch sensors is that by using just one, it is possible to “cover” a whole solid part of the robot. Besides, the sensors are unaffected by ambient noises, such as human voice, TV, music etc. Nevertheless, the fact of using several contact microphones makes possible that a touch gesture is detected by all of them, and each may recognise a different gesture at the same time. The results show that this system is robust against this phenomenon. Moreover, the accuracy obtained for both robots is about 86%.