Smart people detection systems are nowadays using heterogeneous cameras. This paper proposes an architecture which is focused on robustly detecting people by infrared and visible video fusion in smart environment. The architecture covers all levels provided by the INT3 -Horus framework, initially designed to perform monitoring and activity interpretation tasks. Indeed,
INT3 -Horus is used as the development environment where the approach starts with image segmentation in both infrared and visible spectra. Then, the results are fused to enhance the overall detection performance. The paper describes in detail the INT3 -Horus levels selected to implement the new architecture. These are the Acquisition, Segmentation, Fusion, Identification and Tracking levels.