Robust people detection systems are nowadays using heterogeneous cameras. This paper proposes an hierarchical architecture which is focused on robustly detecting people by fusion of infrared and visible video. The architecture covers all levels provided by the INT-Horus framework, initially designed to perform monitoring and activity interpretation tasks. Indeed, INT-Horus is used as the development environment where the approach starts with image segmentation in both infrared and visible spectra. Then, the results are fused to enhance the overall detection performance.