앙상블 알고리즘과 BERT를 이용한 연구논문 주제영역 분류

김성현; 김영민

한국경영공학회 한국경영공학회지 앙상블 알고리즘과 BERT를 이용한 연구논문 주제영역 분류

KCI 등재

앙상블 알고리즘과 BERT를 이용한 연구논문 주제영역 분류

Topic Classification of Research Paper Using Ensemble Algorithms and BERT

김성현 ( Kim Sung Hyun ) , 김영민 ( Kim Young Min )

한국경영공학회 2024.03

한국경영공학회지 29권 1호 19-33(15pages)

DOI 10.35373/KMES.29.1.2

UCI I410-151-25-02-091392361

인용하기 URL 복사 보관함 담기

미리보기

초록

Purpose Developing and comparing a model to classify the topic of research paper using abstract text. Methods Abstract data from 120,000 papers on arXiv was collected, and classification models were developed using ensemble algorithms and BERT. For feature extraction in the ensemble algorithm, TF-IDF, LDA, and Doc2Vec methods were used to create seven feature sets. A total of 22 models were developed using various feature sets and algorithms, and their performance was compared. Results The BERT model exhibited the highest performance with an accuracy of 0.848 and an f1-score of 0.808. Among the ensemble algorithms, LightGBM performed exceptionally well, and the direct reflection of word importance through the TF-IDF vectorization method proved to be effective. Conclusion Developing a model that automatically classifies paper topics by analyzing text offers researchers the opportunity to swiftly access the latest information and identify their research interests. This enhances accessibility to information in research fields and presents the possibility for researchers across diverse domains to gain new insights.

키워드

1. 서론
2. 관련 연구
3. 연구방법
4. 모델
5. 연구 결과
6. 결론
참고문헌

참고문헌 (0)

[자료제공 : 네이버학술정보]