랜덤 포레스트를 이용한 한국어 상호참조 해결

정석원; 최맹식; 김학수

한국정보처리학회 정보처리학회 논문지 (KTSDE) 랜덤 포레스트를 이용한 한국어 상호참조 해결

KCI 등재

랜덤 포레스트를 이용한 한국어 상호참조 해결

Coreference Resolution for Korean Using Random Forests

정석원 ( Seok-won Jeong ) , 최맹식 ( Maengsik Choi ) , 김학수 ( Harksoo Kim )

한국정보처리학회 2016.11

정보처리학회 논문지 (KTSDE) 5권 11호 535-540(6pages)

UCI I410-ECN-0102-2018-500-000288537

인용하기 URL 복사 보관함 담기

미리보기

초록

상호참조 해결은 문서 내에 존재하는 멘션들을 식별하고, 참조하는 멘션끼리 군집화하는 것으로 정보 추출, 사건 추적, 질의응답과 같은 자연어처리 응용에 필수적인 과정이다. 최근에는 기계학습에 기반한 다양한 상호참조 해결 모델들이 제안되었으며, 잘 알려진 것처럼 이런 기계학습 기반 모델들은 상호참조 멘션 태그들이 수동으로 부착된 대량의 학습 데이터를 필요로 한다. 그러나 한국어에서는 기계학습 모델들을 학습할 가용한 공개 데이터가 존재하지 않는다. 그러므로 본 논문에서는 다른 기계학습 모델보다 적은 학습 데이터를 필요로 하는 효율적인 상호참조 해결 모델을 제안한다. 제안 모델은 시브-가이드 자질 기반의 랜덤 포레스트를 사용하여 상호참조하는 멘션들을 구분한다. 야구 뉴스 기사를 이용한 실험에서 제안 모델은 다른 기계학습 모델보다 높은 0.6678의 CoNLL F1-점수를 보였다.

Coreference resolution is to identify mentions in documents and is to group co-referred mentions in the documents. It is an essential step for natural language processing applications such as information extraction, event tracking, and question-answering. Recently, various coreference resolution models based on ML (machine learning) have been proposed, As well-known, these ML-based models need large training data that are manually annotated with coreferred mention tags. Unfortunately, we cannot find usable open data for learning ML-based models in Korean. Therefore, we propose an efficient coreference resolution model that needs less training data than other ML-based models. The proposed model identifies co-referred mentions using random forests based on sieve-guided features. In the experiments with baseball news articles, the proposed model showed a better CoNLL F1-score of 0.6678 than other ML-based models.

키워드

상호참조 해결

랜덤 포레스트

시브-가이드 자질

Coreference Resolution

Random Forest

Sieve-Guided Features

1. 서 론
2. 관련 연구
3. 랜덤 포레스트를 이용한 상호참조 해결
4. 실험 및 결과
5. 결 론
References

참고문헌 (0)

[자료제공 : 네이버학술정보]