This paper presents a new two-phase pattern (2PP) discovery technique for information extraction. 2PP consists of orthographic pattern discovery (OPD) and semantic pattern discovery (SPD) where the OPD determines the structural features from an identified region of a document and the SPD discovers a dominant semantic pattern for the region via inference, apposition and analogy. Then the discovered pattern is applied back into the region to extract required data items through pattern matching. We evaluated 2PP using 6500 data items and obtained effective result.
John Shepherd hasn't uploaded this paper.
Create a free Academia account to let John know you want this paper to be uploaded.