Academia.eduAcademia.edu

V4I2-1971.pdf

AI-generated Abstract

This paper addresses the challenges of table extraction from Portable Document Format (PDF) files, particularly focusing on the difficulties posed by various PDF types such as True PDFs, Scanned PDFs, and Searchable PDFs. It proposes an automated solution that facilitates the conversion of searchable PDFs into XML format and extracts relevant data for storage in a NoSQL database. This approach streamlines the data processing workflow for educational institutions, significantly reducing manual effort while enabling efficient results analysis and insights.