Video Captioning with Visual and Semantic Features

Incheol Kim; Sujin Lee

한국정보처리학회 JIPS(Journal of Information Processing Systems) Video Captioning with Visual and Semantic Features

KCI 등재 SCOPUS

Video Captioning with Visual and Semantic Features

( Sujin Lee ) , ( Incheol Kim )

한국정보처리학회 2018.12

JIPS(Journal of Information Processing Systems) 14권 6호 1318-1330(13pages)

UCI I410-ECN-0102-2019-500-001456577

인용하기 URL 복사 보관함 담기

미리보기

초록

Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

키워드

Attention-Based Caption Generation

Deep Neural Networks

Semantic Feature

Video Captioning

1. Introduction
2. Related Works
3. Video Captioning Model
4. Evaluation
5. Conclusion
Acknowledgement
References

참고문헌 (0)

[자료제공 : 네이버학술정보]