불교를 바라보는 인공지능의 시선 - 거대언어모델(LLM)의 불교 지식 비교 분석 -

길완제; 한광현; 신인수; 장환영

doi:10.22255/JKABS.117.05

한국불교학회 한국불교학 불교를 바라보는 인공지능의 시선 - 거대언어모델(LLM)의 불교 지식 비교 분석 -

KCI 등재

불교를 바라보는 인공지능의 시선 - 거대언어모델(LLM)의 불교 지식 비교 분석 -

The AI Perspective on Buddhism - A Comparative Analysis of Buddhist Knowledge in Large Language Model -

길완제 ( Gil Wanje ) , 한광현 ( Han Kwanghyun ) , 신인수 ( Shin Insu ) , 장환영 ( Jang Hwanyoung )

한국불교학회 2026.02

한국불교학 117권 141-180(40pages)

DOI 10.22255/JKABS.117.05

인용하기 URL 복사 보관함 담기

* 발행 기관의 요청으로 구매가 불가능한 자료입니다.

미리보기

초록

본 연구는 거대언어모델(LLM)이 불교라는 전문적 · 문화적 지식 체계를 얼마나 정확하고 신뢰성 있게 이해하고 재현할 수 있는지를 실증적으로 분석하는 것을 목적으로 한다. 이를 위해 대한불교조계종 포교원이 발행한 『포교사 고시 예상문제집』을 기반으로 총 90문항의 평가 데이터셋을 구축하고, 실제 포교사 시험 형식을 반영하여 1 · 2세션(각 4.5문항, 객관식 40 · 주관식 5)으로 구성하였다. 실험 대상은 OpenAI GPT-5.2, Google Gemini 2.5 Pro, Anthropic Claude Sonnet 4.5, DeepSeek Chat이며, 모든 실험은 API 기반으로 수행하고 온도값을 0으로 고정하였다. 또한 정답, 추론 근거, 확신도를 포함한 구조화된 JSON 응답을 강제하였다. 분석 결과, Gemini와 Claude 모델은 높은 정답률을 보였으나 확신도 보정의 한계가 확인되었고, OpenAI 모델은 비교적 안정적인 확신도 분포를 보였다. 반면 DeepSeek 모델은 높은 확신도에도 불구하고 정답률이 낮아 고확신 오답의 위험성이 두드러졌다. 이러한 결과는 LLM이 불교 교리를 개념적으로 설명하는 ‘지(知)’의 차원에서는 높은 수행력을 보이지만, 의례 · 전승 · 체험의 맥락에서 형성되는 ‘혜(慧)’의 차원에는 구조적 한계를 지님을 보여준다. 본 연구는 이를 통해 불교 인식론의 지 · 혜 구분을 현대 AI 환경에서 실증적으로 재확인하며, 인공지혜를 AI 자체의 지혜가 아닌 인간의 수행과 성찰을 보조하는 방편적 인프라로 재정의할 필요성을 제기한다.

This study aims to empirically examine how accurately and reliably large language models (LLMs) can understand and reproduce Buddhism as a specialized and culturally embedded system of knowledge. To this end, a dataset of 90 evaluation items was constructed based on The Predicted Question Book for the Buddhist Missionary Examination published by the Buddhist Promotion Bureau of the Jogye Order of Korean Buddhism. Reflecting the actual structure of the missionary examination, the dataset was organized into two sessions, each consisting of 45 questions (40 multiple-choice items and 5 open-ended questions). The models evaluated in this study include OpenAI GPT-5.2, Google Gemini 2.5 Pro, Anthropic Claude Sonnet 4.5, and DeepSeek Chat. All experiments were conducted via APIs with the temperature parameter fixed at 0, and each model was required to produce structured JSON outputs containing the selected answer, the reasoning process, and a self-reported confidence score. The results show that the Gemini and Claude models achieved high accuracy rates, but exhibited limitations in confidence calibration. The OpenAI model demonstrated a relatively stable distribution of confidence levels, whereas the DeepSeek model displayed a pronounced risk of high-confidence errors, showing low accuracy despite consistently high confidence scores. These findings indicate that while LLMs perform strongly at the level of knowledge (知, ji)―that is, the conceptual explanation of Buddhist doctrines―they exhibit structural limitations at the level of wisdom (慧, hye), which is shaped through ritual practice, transmission, and embodied experience. This study thus empirically reaffirms the classical Buddhist epistemological distinction between knowledge and wisdom in the context of contemporary AI systems, and argues for a reconceptualization of artificial wisdom not as wisdom inherent to AI itself, but as an instrumental infrastructure that supports human practice and reflective cultivation.

키워드

Large Language Models

Buddhist Epistemology

Generative Artificial Intelligence

Buddhist Knowledge

Artificial Wisdom

Ⅰ. 머리말
Ⅱ. 이론적 배경 및 관련 연구
Ⅲ. 연구 방법 및 실험 환경 구축
Ⅳ. 모델별 불교지식 비교 분석
Ⅴ. 결론 및 시사점

참고문헌 (0)

[자료제공 : 네이버학술정보]