한국어 학습자 쓰기 자동채점을 위한 하이브리드 모델 연구

이진

doi:10.31147/IALL.104.19

국제어문학회 국제어문 한국어 학습자 쓰기 자동채점을 위한 하이브리드 모델 연구

KCI 등재

한국어 학습자 쓰기 자동채점을 위한 하이브리드 모델 연구

Designing a Hybrid Model for Automated Essay Scoring of Korean Language Learners

이진 ( Lee Jin )

국제어문학회 2025.03

국제어문 104권 525-562(38pages)

DOI 10.31147/IALL.104.19

인용하기 URL 복사 보관함 담기

미리보기

초록

본 연구는 한국어 학습자 쓰기 자동채점을 위한 하이브리드(hybrid) 모델 설계를 위해 다층적 채점 자질을 추출하고 각 자질들이 작문 점수에 미치는 영향을 구체적으로 밝히는 것을 목적으로 한다. 이를 위해 기존 연구에서 주로 활용된 얕은 수준의 언어적 자질과 사전학습 기반 딥러닝 모델을 통해 추출한 깊은 수준의 의미적 자질을 결합하여 하이브리드 자동채점 모델을 구현하였다. 또한, 설명 가능한 인공지능(XAI)을 적용하여 추출한 채점 자질들이 작문 점수에 미치는 영향을 세부적으로 분석하였다. 모델 성능 평가 결과, 얕은 수준의 자질만 활용한 모델보다 문장 임베딩을 통해 추출한 의미적, 구조적 자질을 결합한 하이브리드 모델이 더 높은 정확도를 보였다. 특히, 문장 임베딩을 활용한 ‘고득점 작문과의 유사도’가 자동채점 모델의 정확도를 높이는 데 가장 크게 기여하는 것으로 나타났다. 자질 중요도 분석 결과, ‘오류 토큰/타입 비율’이 작문 점수를 예측하는 데 가장 주요하게 작용하는 것으로 나타났으며, 그 다음으로 문장 임베딩을 적용한 ‘고득점 작문과의 유사도’, ‘의존 트리 노드 깊이’, ‘어절 수’, ‘국제 통용 한국어 어휘 비율’ 등이 작문 점수에 큰 영향력을 보였다. ‘오류 비율’과 ‘국제 통용 한국어 초급 어휘 비율’이 높아질수록 점수에 부적 영향을 미치는 반면, ‘의미적 유사도’나 ‘문장 구문 복잡도’가 높아질수록 점수에 정적 영향을 미쳤다. 본 연구는 자질 설계를 통해 추출한 얕은 수준의 자질과 사전학습 기반 딥러닝 모델을 통해 추출한 의미적, 구조적 자질을 결합하여 하이브리드 자동채점 모델을 제안하였다는 데 의의가 있다. 이러한 하이브리드 자동채점 모델은 자동채점의 예측 정확도와 설명 가능성을 동시에 향상시키는 데 기여할 것으로 기대된다. 나아가, 설명 가능한 인공지능 기법을 적용하여 채점 모델의 판단 근거를 상세히 분석함으로써, 자동채점 모델이 단순히 점수만 부여하는 데 그치지 않고, 학습자 맞춤형 피드백을 제공하는 데도 기여할 수 있을 것이다.

This study aims to design a hybrid model for automated essay scoring (AES) of Korean language learners by extracting multilevel scoring features applicable to the scoring model and analyzing their impact on writing scores. A model was developed by integrating shallow linguistic features used in previous studies and deep semantic features extracted from pre-trained deep learning models. Additionally, Explainable Artificial Intelligence (XAI) techniques were applied to examine the influence of the extracted scoring features on the essay scores. The analysis results indicate that improving the performance of the AES model requires integrating not only shallow features, such as error count and writing length, but also deep semantic and structural features, obtained through pre-trained deep learning models. Specifically, semantic features, derived from sentence embedding, play a significant role in enhancing the accuracy of the AES model. Furthermore, an analysis of the impact of each scoring feature on essay scores revealed that the “error token/type ratio” had the greatest influence, while deep semantic features, such as “similarity to high-scoring essays,” also strongly impacted writing scores. Additionally, sentence syntactic complexity features, such as “average depth of dependency tree nodes,” were also significant. Other high-ranking features included “word count,” related to writing length, and “ratio of beginner-level Korean vocabulary,” reflecting the use of Korean educational vocabulary. This study is significant because the proposed hybrid AES model for Korean language learners integrates feature engineering with deep learning algorithms, which are expected to enhance its accuracy as well as interpretability. Furthermore, the detailed analysis of scoring features not only contributes to improving AES model performance, but also provides valuable insights for establishing evaluation criteria and analyzing assessment results for human raters.

키워드

랜덤 포레스트(Random Forest) 회귀

Korean Language Education

Writing

Automated Essay Evaluation

Hybrid Model

Random Forest Regression

Sentence-BERT

Explainable Artificial Intelligence (XAI)

SHAP (Shapley Additive Explanations)

Feature Engineering

Text Similarity

1. 머리말
2. 선행연구
3. 연구 대상 및 연구 방법
4. 쓰기 자동채점 모델 성능 평가
5. SHAP를 활용한 채점 자질 중요도 분석
6. 맺음말

참고문헌 (0)

[자료제공 : 네이버학술정보]