논문 상세보기

한국언어정보학회> 국제 워크샵> Penn Korean Treebank: Development and Evaluation

Penn Korean Treebank: Development and Evaluation

Chung Hye Han , Na Rae Han , Eon Suk Ko , Martha Palmer , Heejong Yi
  • : 한국언어정보학회
  • : 국제 워크샵 2002권0호
  • : 연속간행물
  • : 2002년 01월
  • : 69-78(10pages)
국제 워크샵

DOI


목차


					

키워드 보기


초록 보기

This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control and the evaluation on the Treebank are also presented.

UCI(KEPA)

I410-ECN-0102-2015-700-001895667

간행물정보

  • : 어문학분야  > 언어학
  • :
  • :
  • : 기타
  • :
  • :
  • : 학술지
  • : 연속간행물
  • : 1983-2002
  • : 265


저작권 안내

한국학술정보㈜의 모든 학술 자료는 각 학회 및 기관과 저작권 계약을 통해 제공하고 있습니다.

이에 본 자료를 상업적 이용, 무단 배포 등 불법적으로 이용할 시에는 저작권법 및 관계법령에 따른 책임을 질 수 있습니다.

2002권0호(2002년 01월) 수록논문
최근 권호 논문
| | | |

1Robust Syntactic Annotation of Corpora and Memory-Based Parsing

저자 : Erhard W Hinrichs

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 1-1 (1 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

This talk provides an overview of current work in my research group on the syntactic annotation of the Tiibingen corpus of spoken German and of the German Reference Corpus (Deutsches Referenzkorpus: DEREKO) of written texts. Morpho-syntactic and syntactic annotation as well as annotation of function-argument structure for these corpora is performed automatically by a hybrid architecture that combines robust symbolic parsing with finite-state methods ("chunk parsing" in the sense Abney) with memory-based parsing (in the sense of Daelemans). The resulting robust annotations can be used by theoretical linguists, who are interested in large-scale, empirical data, and by computational linguists, who are in need of training material for a wide range of language technology applications. To aid retrieval of annotated trees from the treebank, a query tool VIQTORYA with a graphical user interface and a logic-based query language has been developed. VIQTORYA allows users to query the treebanks for linguistic structures at the word level, at the level of individual phrases, and at the clausal level.

2A Simple Syntax for Complex Semantics

저자 : Kiyong Lee

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 2-27 (26 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

As part of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processeing strings of words or sentences to construct semantic data structures. For design and modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction and temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser``s (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity and two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be concep-tually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax can incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes and that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

3Identification of Chinese Personal Names in Unrestricted Texts

저자 : Lawrence Cheung , Benjamin K Tsou , Maosong Sun

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 28-35 (8 pages)

다운로드

(기관인증 필요)

초록보기

Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name identification algorithm.

4Mismatches in Korean Copula Constructions and Linearization Effects

저자 : Chan Chung , Jong Bok Kim

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 36-49 (14 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

One main complexity of the copula constructions concerns a mismatch between morphology and syntactic constituency: the copula seems to form a morphological unit with the immediately preceding element, whereas in terms of syntax the copula appears to take this as its syntactic complement. In capturing such mismatches, we show that the copula is treated as an independent verb at the level of tectogrammatical structure (or syntax tree), whereas as a bound morpheme at the level of phenogram-matical structure (or domain tree), in terms of Dowty 1992 (or Reape 1994). This paper, adopting the notion of DOMAIN in HPSG, shows that copula constructions are a subtype of compacting-constructions. These constructions compact the domain value of the copula and that of its preceding element together into one domain unit, eventually making it inert to syntactic phenomena such as scrambling, deletion and pro-form substitution. This construction-based approach provides a clean analysis for the formation of the copula construction and related phenomena.

5Heuristic-based Korean Coreference Resolution for Information Extraction

저자 : Euisok Chung , Soojong Lim , Bo Hyun Yun

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 50-58 (9 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

The information extraction is to delimit in advance, as part of the specification of the task, the semantic range of the output and to filter information from large volumes of texts. The most representative word of the document is composed of named entities and pronouns. Therefore, it is important to resolve coreference in order to extract the meaningful information in information extraction. Coreference resolution is to find name entities co-referencing real-world entities in the documents. Results of coreference resolution are used for name entity detection and template generation. This paper presents the heuristic-based approach for coreference resolution in Korean. We constructed the heuristics expanded gradually by using the corpus and derived the salience factors of antecedents as the importance measure in Korean. Our approach consists of antecedents selection and antecedents weighting. We used three kinds of salience factors that are used to weight each antecedent of the anaphor. The experiment result shows 80% precision.

6On Negative Imperatives in Korean

저자 : Chung Hye Han , Chung Min Lee

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 59-68 (10 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

In this paper, we address two questions concerning negative imperatives in Korean: (i) what is the morpho-syntactic nature of ma1 in negative imperatives?; and (ii) why is it impossible to form negative imperatives with short negation an? We will argue that the clause structure of imperatives include a projection of deontic modality and a projection of imperative operator encoding illocutionary force, and that ma1 is a lexicalization of long negation and deontic modality. We then propose that a negative imperative with short negation is ruled out because such construction maps onto incoherent interpretation which can be spelled out as I direct you to bring about a negative state or a negative event.

7Penn Korean Treebank: Development and Evaluation

저자 : Chung Hye Han , Na Rae Han , Eon Suk Ko , Martha Palmer , Heejong Yi

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 69-78 (10 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control and the evaluation on the Treebank are also presented.

8A Deterministic Method for Structural Analysis of Compound Words in Japanese

저자 : Dongli Han , Takeshi Ito , Teiji Furugori

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 79-91 (13 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

Structural analysis of compound words is necessary and an important process in natural language processing. Proposed here is a corpus- and statistics- based method for the structural analysis of compound words in Japanese. We determine the structure of a ``compound word by using Internet corpus and calculating the strength of word association among its constituent words. Experiments with 5, 6, 7, and 8 kanji compound words show that our method works well and its performance is better than those of other comparable studies.

9Implicit Adjuncts: The Cases of Degree Modifiers in Japanese and English

저자 : Akira Ikeya , Hisako Ikawa

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 92-102 (11 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

The issue of adjuncts has long been a neglected field of linguistic study whether it be syntactic or semantic. It is only in Pustejovsky (1995) that we find a brief mention of adjuncts. In addition to what the author calls true arguments, default arguments, and shadow arguments, he sets up a class of true adjuncts citing the following sentence, Mary drove down to New York on Tuesday. We will take up a small lexical item sugiru in Japanese, and we will argue that we should posit the notion of implicit adjuncts in describing the properties with the small Japanese lexical item sugiru. Throughout the discussions that follow we will demonstrate how the notion is independently motivated irrespective of what linguistic theory we are going to adopt.

10Type Construction of Nouns with the Verb ha-"do"

저자 : Seohyun Im , Chungmin Lee

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 103-112 (10 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

This paper aims to give an explanation of the combination of certain nouns and the verb ha- ``do``. Although the verb ha- ``do`` normally takes an event type argument, it takes some substantival nouns such as paiolin ``violin``, umsikcem ``restaurant``, and so on. A substantival noun undergoes type shifting because the governing verb ha- ``do`` coerces an entity type noun to an event reading, taking missing information from the qualia of the entity type noun. In addition, some nouns like ppallay ``laundry`` are dot objects. The verb taking a dot object selects a proper type between multiple subtypes of the dot object. Type pumping operation makes that selection possible.

권호별 보기
같은 권호 다른 논문
| | | | 다운로드

1Robust Syntactic Annotation of Corpora and Memory-Based Parsing

저자 : Erhard W Hinrichs

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 1-1 (1 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

This talk provides an overview of current work in my research group on the syntactic annotation of the Tiibingen corpus of spoken German and of the German Reference Corpus (Deutsches Referenzkorpus: DEREKO) of written texts. Morpho-syntactic and syntactic annotation as well as annotation of function-argument structure for these corpora is performed automatically by a hybrid architecture that combines robust symbolic parsing with finite-state methods ("chunk parsing" in the sense Abney) with memory-based parsing (in the sense of Daelemans). The resulting robust annotations can be used by theoretical linguists, who are interested in large-scale, empirical data, and by computational linguists, who are in need of training material for a wide range of language technology applications. To aid retrieval of annotated trees from the treebank, a query tool VIQTORYA with a graphical user interface and a logic-based query language has been developed. VIQTORYA allows users to query the treebanks for linguistic structures at the word level, at the level of individual phrases, and at the clausal level.

2A Simple Syntax for Complex Semantics

저자 : Kiyong Lee

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 2-27 (26 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

As part of a long-ranged project that aims at establishing database-theoretic semantics as a model of computational semantics, this presentation focuses on the development of a syntactic component for processeing strings of words or sentences to construct semantic data structures. For design and modeling purposes, the present treatment will be restricted to the analysis of some problematic constructions of Korean involving semi-free word order, conjunction and temporal anchoring, and adnominal modification and antecedent binding. The present work heavily relies on Hausser``s (1999, 2000) SLIM theory for language that is based on surface compositionality, time-linearity and two other conditions on natural language processing. Time-linear syntax for natural language has been shown to be concep-tually simple and computationally efficient. The associated semantics is complex, however, because it must deal with situated language involving interactive multi-agents. Nevertheless, by processing input word strings in a time-linear mode, the syntax can incrementally construct the necessary semantic structures for relevant queries and valid inferences. The fragment of Korean syntax will be implemented in Malaga, a C-type implementation language that was enriched for both programming and debugging purposes and that was particluarly made suitable for implementing in Left-Associative Grammar. This presentation will show how the system of syntactic rules with constraining subrules processes Korean sentences in a step-by-step time-linear manner to incrementally construct semantic data structures that mainly specify relations with their argument, temporal, and binding structures.

3Identification of Chinese Personal Names in Unrestricted Texts

저자 : Lawrence Cheung , Benjamin K Tsou , Maosong Sun

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 28-35 (8 pages)

다운로드

(기관인증 필요)

초록보기

Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name identification algorithm.

4Mismatches in Korean Copula Constructions and Linearization Effects

저자 : Chan Chung , Jong Bok Kim

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 36-49 (14 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

One main complexity of the copula constructions concerns a mismatch between morphology and syntactic constituency: the copula seems to form a morphological unit with the immediately preceding element, whereas in terms of syntax the copula appears to take this as its syntactic complement. In capturing such mismatches, we show that the copula is treated as an independent verb at the level of tectogrammatical structure (or syntax tree), whereas as a bound morpheme at the level of phenogram-matical structure (or domain tree), in terms of Dowty 1992 (or Reape 1994). This paper, adopting the notion of DOMAIN in HPSG, shows that copula constructions are a subtype of compacting-constructions. These constructions compact the domain value of the copula and that of its preceding element together into one domain unit, eventually making it inert to syntactic phenomena such as scrambling, deletion and pro-form substitution. This construction-based approach provides a clean analysis for the formation of the copula construction and related phenomena.

5Heuristic-based Korean Coreference Resolution for Information Extraction

저자 : Euisok Chung , Soojong Lim , Bo Hyun Yun

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 50-58 (9 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

The information extraction is to delimit in advance, as part of the specification of the task, the semantic range of the output and to filter information from large volumes of texts. The most representative word of the document is composed of named entities and pronouns. Therefore, it is important to resolve coreference in order to extract the meaningful information in information extraction. Coreference resolution is to find name entities co-referencing real-world entities in the documents. Results of coreference resolution are used for name entity detection and template generation. This paper presents the heuristic-based approach for coreference resolution in Korean. We constructed the heuristics expanded gradually by using the corpus and derived the salience factors of antecedents as the importance measure in Korean. Our approach consists of antecedents selection and antecedents weighting. We used three kinds of salience factors that are used to weight each antecedent of the anaphor. The experiment result shows 80% precision.

6On Negative Imperatives in Korean

저자 : Chung Hye Han , Chung Min Lee

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 59-68 (10 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

In this paper, we address two questions concerning negative imperatives in Korean: (i) what is the morpho-syntactic nature of ma1 in negative imperatives?; and (ii) why is it impossible to form negative imperatives with short negation an? We will argue that the clause structure of imperatives include a projection of deontic modality and a projection of imperative operator encoding illocutionary force, and that ma1 is a lexicalization of long negation and deontic modality. We then propose that a negative imperative with short negation is ruled out because such construction maps onto incoherent interpretation which can be spelled out as I direct you to bring about a negative state or a negative event.

7Penn Korean Treebank: Development and Evaluation

저자 : Chung Hye Han , Na Rae Han , Eon Suk Ko , Martha Palmer , Heejong Yi

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 69-78 (10 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

This paper discusses issues in building a 54-thousand-word Korean Treebank using a phrase structure annotation, along with developing annotation guidelines based on the morpho-syntactic phenomena represented in the corpus. Various methods that were employed for quality control and the evaluation on the Treebank are also presented.

8A Deterministic Method for Structural Analysis of Compound Words in Japanese

저자 : Dongli Han , Takeshi Ito , Teiji Furugori

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 79-91 (13 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

Structural analysis of compound words is necessary and an important process in natural language processing. Proposed here is a corpus- and statistics- based method for the structural analysis of compound words in Japanese. We determine the structure of a ``compound word by using Internet corpus and calculating the strength of word association among its constituent words. Experiments with 5, 6, 7, and 8 kanji compound words show that our method works well and its performance is better than those of other comparable studies.

9Implicit Adjuncts: The Cases of Degree Modifiers in Japanese and English

저자 : Akira Ikeya , Hisako Ikawa

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 92-102 (11 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

The issue of adjuncts has long been a neglected field of linguistic study whether it be syntactic or semantic. It is only in Pustejovsky (1995) that we find a brief mention of adjuncts. In addition to what the author calls true arguments, default arguments, and shadow arguments, he sets up a class of true adjuncts citing the following sentence, Mary drove down to New York on Tuesday. We will take up a small lexical item sugiru in Japanese, and we will argue that we should posit the notion of implicit adjuncts in describing the properties with the small Japanese lexical item sugiru. Throughout the discussions that follow we will demonstrate how the notion is independently motivated irrespective of what linguistic theory we are going to adopt.

10Type Construction of Nouns with the Verb ha-"do"

저자 : Seohyun Im , Chungmin Lee

발행기관 : 한국언어정보학회 간행물 : 국제 워크샵 2002권 0호 발행 연도 : 2002 페이지 : pp. 103-112 (10 pages)

다운로드

(기관인증 필요)

키워드 보기
초록보기

This paper aims to give an explanation of the combination of certain nouns and the verb ha- ``do``. Although the verb ha- ``do`` normally takes an event type argument, it takes some substantival nouns such as paiolin ``violin``, umsikcem ``restaurant``, and so on. A substantival noun undergoes type shifting because the governing verb ha- ``do`` coerces an entity type noun to an event reading, taking missing information from the qualia of the entity type noun. In addition, some nouns like ppallay ``laundry`` are dot objects. The verb taking a dot object selects a proper type between multiple subtypes of the dot object. Type pumping operation makes that selection possible.

주제별 간행물
간행물명 최신권호

중한언어문화연구
20권 0호

KCI등재

한국언어문화
75권 0호

KCI등재

중국학연구
97권 0호

KCI등재

언어와 문화
17권 3호

KCI등재

통번역학연구
25권 3호

KCI등재

외국어교육연구
35권 3호

KCI등재

언어연구
37권 2호

KCI후보

한국어문교육
36권 0호

KCI등재

언어와 언어학
93권 0호

KCI등재

언어와 정보사회
43권 0호

KCI등재

언어와 정보사회
43권 0호

KCI등재

언어와 정보사회
43권 0호

KCI등재

언어와 정보사회
43권 0호

KCI등재

Journal of Pan-Pacific Association of Applied Linguistics (Journal of PAAL )
25권 1호

KCI등재

언어와 정보
25권 2호

KCI등재

언어학
29권 2호

KCI등재

일본어문학
89권 0호

KCI등재

기호학 연구
67권 0호

KCI등재

이중언어학
83권 0호

KCI후보

T&I review
11권 0호
발행기관 최신논문
자료제공: 네이버학술정보
발행기관 최신논문
자료제공: 네이버학술정보

내가 찾은 최근 검색어

최근 열람 자료

맞춤 논문

보관함

내 보관함
공유한 보관함

1:1문의

닫기