통계 기반 한국어 형태소 분석기의 성능 개선

심광섭

성신여자대학교 인문과학연구소 人文科學硏究 통계 기반 한국어 형태소 분석기의 성능 개선

KCI 후보

통계 기반 한국어 형태소 분석기의 성능 개선

Improving the Performance of Statistical Korean Morphological Analyzer

심광섭 ( Kwangseob Shim )

성신여자대학교 인문과학연구소 2016.02

人文科學硏究 34권 285-316(32pages)

UCI I410-ECN-0102-2016-000-000708579

인용하기 URL 복사 보관함 담기

미리보기

초록

Statistical Korean morphological analysis is a brand-new approach in that it does not require a manually built machine-readable morphology dictionary. Instead, it uses statistical information that is acquired from POS-tagged corpus. The acquisition of statistical information is fully automated, so that no human intervention is required in the process. This is a good side of the statistical approach to Korean morphological analysis. The bad side of the approach is its low precision, meaning that the number of false positives is relatively high. In order to improve the precision, this paper proposes a method of filtering false positives. The proposed method introduces two types of dictionaries, one-syllable-morpheme dictionary and josa-eomi dictionary, which are automatically constructed when statistical information is collected from the POS-tagged corpus. To evaluate the performance of the proposed method, 10-fold cross-validation is performed with 10 million eojeol Sejong POS-tagged corpus. The experimental results show that the precision has been improved by 5%.

키워드

한국어 형태론

Korean morphology

형태소 분석

morphological analysis

1. 서론
2. 음절 단위의 한국어 형태소 분석
3. 통계 기반 한국어 형태소 분석
4. 통계 기반 한국어 형태소 분석 성능 개선 방안
5. 실험 및 결과
6. 결론

참고문헌 (0)

[자료제공 : 네이버학술정보]