This paper presents multimodal image fusion with human pose for detecting abnormal human behaviors in low illumination conditions. Detecting human behaviors in low illumination conditions is challenging due to its limited visibility of the objects of interest in the scene. Multimodal image fusion simultaneously combines visual information in the visible spectrum and thermal radiation information in the long-wave infrared spectrum. We propose an abnormal event detection scheme based on the multimodal fused image and the human poses using the keypoints to characterize the action of the human body. Our method assumes that human behaviors are well correlated to body keypoints such as shoulders, elbows, wrists, hips. In detail, we extracted the human keypoint coordinates from human targets in multimodal fused videos. The coordinate values are used as inputs to train a multilayer perceptron network to classify human behaviors as normal or abnormal. Our experiment demonstrates a significant result on multimodal imaging dataset. The proposed model can capture the complex distribution pattern for both normal and abnormal behaviors.