AX(AI Transformation)를 위한 비정형 데이터의 효율적인 전처리 방안 연구 - The journal of the institute of internet, broadcasting and communication : JIIBC - 한국인터넷방송통신학회

한국인터넷방송통신학회 The journal of the institute of internet, broadcasting and communication : JIIBC AX(AI Transformation)를 위한 비정형 데이터의 효율적인 전처리 방안 연구

AX(AI Transformation)를 위한 비정형 데이터의 효율적인 전처리 방안 연구

Research on Efficient Preprocessing Method of Unstructured Data for AI Transformation

안필용, 이충형

한국인터넷방송통신학회2025

The journal of the institute of internet, broadcasting and communication : JIIBC 25권 3호 183-192(10pages)

DOI KISTI1.1003/JNL.JAKO202518339603600

인용하기 URL 복사 보관함 담기

초록

본 연구는 비정형 데이터의 급증과 함께 중요성이 부각되고 있는 AI 기반 문서 전처리 기술의 개념과 구조, 적용 방안에 대해 분석하고, 다양한 사례를 통해 기술적 시사점과 정책적 대응 전략을 제시한다. 특히 거대 언어 모델(LLM) 및 검색 강화 생성(RAG)과 같은 최신 AI 전환에서 전처리 기술이 응답 정확도, 처리 속도, 문맥 이해력에 미치는 영향을 정량적으로 분석하였다. 이를 위해 광학 문자 인식(OCR), 자연어 처리(NLP), 데이터 비식별화, LLM 기반 구조화 기술 등 핵심 기술을 중심으로 사례를 분류하고, 금융 및 공공 분야의 실제 적용 사례를 통해 전처리 기술의 현장성과 실효성을 검증하였다. 분석 결과, 전처리 기술은 단순한 사전 작업을 넘어 전체 AI 전환의 성능 향상에 직접적으로 기여하며, 향후 AI 전환 전략의 핵심 기반 기술로 기능할 수 있음을 확인하였다. 또한 R&D 투자, 데이터 표준화, 개인정보 보호, 산학연 협력 등 정책적 지원체계 마련이 병행될 때, 해당 기술의 지속 가능성과 산업적 확산 가능성이 극대화될 수 있음을 제안하였다.

This study investigates the concept, architecture, and implementation strategies of AI-based document preprocessing technologies, which are becoming increasingly critical amid the rapid expansion of unstructured data. Emphasizing their role in enhancing the performance of advanced AI systems - particularly Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) architectures - this study quantitatively analyzes how preprocessing affects response accuracy, processing efficiency, and contextual understanding. By exploring core technologies such as Optical Character Recognition (OCR), Natural Language Processing (NLP), data anonymization, and LLM-assisted document structuring, the study classifies use cases and verifies the practical effectiveness of these technologies across the finance and public sectors. The findings indicate that preprocessing is not merely a preparatory step but a critical component in optimizing end-to-end AI performance, highlighting its status as a foundational enabler for AI transformation strategies. Furthermore, the study underscores the importance of national-level support - including R&D investment, data standardization, privacy safeguards, and public-private-academic collaboration - to ensure the sustainable adoption and broad diffusion of these technologies.

키워드

인공지능 . 인공지능 전환 . 비정형데이터 . 거대 언어 모델 . 검색강화 생성

참고문헌 (0)