닫기
216.73.216.106
216.73.216.106
close menu
KCI 등재 SCOPUS
Video Captioning with Visual and Semantic Features
( Sujin Lee ) , ( Incheol Kim )
UCI I410-ECN-0102-2019-500-001456577

Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).

1. Introduction
2. Related Works
3. Video Captioning Model
4. Evaluation
5. Conclusion
Acknowledgement
References
[자료제공 : 네이버학술정보]
×