A Survey On Event Extraction From Webpage
A Survey On Event Extraction From Webpage
University,Mosul- Iraq
Abstract— Event extraction (EE) is one of most important as a multipurpose subject, is highly related to statistics,
researches in the field of Natural Language Processing (NLP). computer science, and NLP [5].
Numerous significant events occur every day all over the
world, and it published in various media outlets with varying The event in general can be defined as "a specific
narrative styles. A primary job in EE is to determine if events occurrence” of happening that occurs at a specific place and
in the real world have been reported in articles and posts time with which one or more participants are related.
driven by the 4Ws’ (what, who, when, and where). A complex Extracting events includes two levels of extraction; the
relationship between people can be explained by an event, document level and the sentence level. Those two levels
place, actions, and objects. An event-centered model captures allow the event extractor to achieve the goal, which requires
the dynamic and semantic aspects of an event representation of the ability to answer the questions of the extracting tool,
event facts. An updated and comprehensive survey is needed including: When, Who, Where, … etc [6].
due to the proliferation of methods, datasets, and evaluation
metrics in the literature. In this paper, a survey on extracting Event terms and definitions include the following [7]:
events from websites and defining their types and applications
Event mention: a phrase or sentence in which an
in several different fields has been discussed. In addition, the
event, including a trigger and arguments, is
study presents the strengths and weaknesses of event
extraction systems for different types of models
described.
Event trigger: a keyword that clearly describes
Keywords— Event extraction, natural language processing an event that has occurred, is a verb or a noun.
(NLP), event corpus, web page.
Event argument: an entity mentioned, the entity
I. INTRODUCTION states the temporal expression, the attribution of
An event is something that happens. The event can a specific role in an event, or the value that is
often be described as a change of state. It is specific to the provided as a participant.
participants [1]. The event extraction aims to find event Argument role: It represents the relationship
instances in the published texts and, if they exist, identify between the argument and the event in which it
the type of the event with all of its attributes and participates.
participants. Although different types of events can be Event extraction is intended to extract a characterization
defined by their different arguments, a simple event of an event from the text, defined by a set of entities
summary by text extraction can be defined as obtaining a associated with a specific role in the event. Some techniques
structured events representation from unstructured natural employ data-driven approaches, while others employ
languages, to assist to answer the "5W1H" questions. These knowledge-driven approaches, and still, others employ
questions include "who, when, where, what, why," and hybrid approaches.
"how" about the events in the real world from a variety of
text sources, such as social media posts, news articles, and III. EVENT EXTRACTION CORPORA
so on [2]. Event Extraction corpora are annotated by domain
Information retrieval is considered an important task in experts or professionals and used to train or evaluate models.
Natural Language Processing (NLP). It is an event However, as the annotation process is cost-prohibitive, many
extraction that has many different applications in various public corpora are small in size and have low coverage as the
fields [3]. So, the structured events, for example, can be ACE event corpus, the TAC-KBP corpus, the TDT corpus,
used directly to expand the knowledge on which further and other domain-specific corpora [2,8]. They are based on
logical reasoning and inference can be made [4]. two major types of extraction Domains as follows as shown
in Fig. 1.
II. EVENT EXTRACTION TASK
The event extraction (EE) requires Named Entity
Recognition (NER) and Relation Extraction (RE) tasks. EE,
Authorized licensed use limited to: Florida Institute of Technology. Downloaded on March 29,2023 at 11:22:10 UTC from IEEE Xplore. Restrictions apply.
8thInternational Conference on Contemporary Information Technology and Mathematics (ICCITM2022) , Mosul
University,Mosul- Iraq
160
Authorized licensed use limited to: Florida Institute of Technology. Downloaded on March 29,2023 at 11:22:10 UTC from IEEE Xplore. Restrictions apply.
8thInternational Conference on Contemporary Information Technology and Mathematics (ICCITM2022) , Mosul
University,Mosul- Iraq
performance with low time and cost. It achieved a 97.59% of (70%) accuracy and it is more accurate than other related
F1-score and 89.96% of accuracy. The results showed higher methods.
accuracy with NY Times and NY Post datasets that exceeded
99.15% and 98.89%. Matthew Crittenden, (2021) [24] proposed a causal
network to extract relevant event-causal structures on
Yang et al. (2018) [17] presented a framework that ConceptNet and Wikipedia. The proposed network uses
detects the event mentions and then extracts the events from event-causal attributes that are extracted in the bidirectional
the financial news at the document level. The authors transformer encoder to effectively capture long-range
presented Document-level Chinese Financial Event interdependencies. This group increases the complexity of
Extraction (DCFEE) to generate more labeled data to extract the task by classifying entity-type arguments as well as
Chines Financial events. The results showed that the system complex argument types. The model used Huggingface’s
gained up to 94.5 accuracies for mention labeling and 94.08 bert-base-multilingual cased model. It had been pre-trained
for automatic label generation. on 104 different languages with and mini-batch size of 4 on a
single Tesla k40-C and a maximum sequence length of 512
Björne and Salakoski (2018) [18] developed a and trained for 20 epochs.
Convolutional Neural Network (CNN) to be used in event
and relation extraction. The input text is converted to a linear Dilek Kuc¸uk (2022) [25], the researcher proposed
representation. The information is encoded by vector space Energy Monitoring via Information Extraction (EneMonIE) a
embeddings. The dependency path embeddings are used to Web-based semantic system for monitoring current energy
encode the parse graph. The open-source Turku Event trends using automatic, continuous and guided EE from
Extraction System (TEES) is used to integrate CNN. A 12- various forms of media available on the Web. The system
event relation and NER corpora had been checked and included online news videos, online news articles, social
showed good performance on deferent corpora. media texts, open-access scientific papers, and technical
reports, as well as many digital energy data made available to
Li et al. (2019) [19] presented a knowledge base KB- humans by energy institutions. EneMonIE is an important
driven tree-structured long short-term memory networks source of short information for decision-makers, power
(Tree-LSTM) and implemented two new features: (i) generation, transmission and distribution system operators,
dependency structures and (ii) entity properties. The energy research centers, investors, and related entrepreneurs,
approach was evaluated on the BioNLP shared task using the as well as for academics and students. The system has
Genia dataset. It achieved 86% of accuracy for simple various data sources, automatic text processing capabilities,
events. and display facilities open for public use; due to the
Yang, et al. (2019) [20] presented an approach that availability of automatic text processing capabilities, various
extracts the events occurring in plain text. The approach data sources and display facilities are available.
consists of two stages, firstly, trigger extraction is performed,
Jacobs et al. (2022) [26], presented a SENTiVENT
then argument extraction. The authors presented a Pre- scheme to detect economic news articles. It used event
trained Language Model based Event Extractor (PLMEE). triggers, participant arguments, event co-reference, and event
The results showed that the approach gained 81.1 for triggers attributes such as (type, subtype, negation, and modality).
extraction and 58.9 for arguments extractions. The results showed that the scheme obtained a 59% F1 score
Zhang, et al. (2019) [21] presented an entity and event for data set consisting of 18 kinds and 64 subtypes. The
extraction that used generative adversarial imitation learning. training was performed among 6200 events in 288
A Q-learning scanner scans the text to detect the event documents.
boundaries, its triggers, and its entities. The extractor detects Meisin, et al. (2022) [27] presented a system to extract
the connected triggers with the entities. The argument roles events from English Crude Oil news. A seed set of 175 for
are determined. A GAN algorithm is used to train the the news articles. A 25 news subset was used as the
framework on the extracted features of the events. The adjudicated reference test set. The resulting corpus has 425
framework gained 85.2% accuracy and an 80.8 F1 score. news articles with approximately 11k events annotated. The
Felix Hamborg, et al. (2019) [22] presented a system that model trains basic event extraction models to label data. A
used grammar rules to make specific rules for extraction of special dataset is used that contains oil-related triggers and
the related items of the phrases from English articles. The arguments. The overall evaluation results showed the high
system gave answers to 5W1H Questions to determine the performance of the proposed system.
main event in the article. The results showed the system's JianweiLv and others [25] propose an advanced multi-
capability to determine the main event from the answers to task learning framework, named TNC, based on their
just the first four W questions. The system had 82% of original concept: Trigger is Non-Central, in which event
accuracy for the first four questions. argument extraction is performed synchronously with the
Fisichella and Ceroni, (2021) [23] presented a basis for a event triggers. Using label representations and an auxiliary
singular class of evolution-aware entity-primarily based task called Sentence Event Identification (SEI), our TNC
enrichment algorithms to detect events in Wikipedia. They extracts multiple event triggers and arguments
supposed that it would increase the quality of accessibility simultaneously. A special symbol is also designed to merge
and timeliness of Wikipedia's entity retrieval. the representation of candidate arguments over the
Comprehensive experiments had been conducted on a 1.8 Transformer encoder. Experimental results have shown that
million articles dataset. It relied on a supervised model that our model achieves state-of-the-art compared to other
can detect an event in a non-annotated corpus. It gained models, with higher effectiveness and adaptability.
161
Authorized licensed use limited to: Florida Institute of Technology. Downloaded on March 29,2023 at 11:22:10 UTC from IEEE Xplore. Restrictions apply.
8thInternational Conference on Contemporary Information Technology and Mathematics (ICCITM2022) , Mosul
University,Mosul- Iraq
AliBalali and others [26] extracts multiple event triggers models. The closed events extraction models give high
and arguments simultaneously by introducing the shortest performance compared with open event extraction.
dependency path in the dependency graph. The long-range
dependencies are captured by eliminating irrelevant words VII. CONCLUSION
from the sentence. The attention-based graph convolutional Although there are many challenges, text mining
network is also proposed for carrying syntactically related especially open event mining is attracting more and more
information along shortest paths between argument attention due to its important role in information mining.
candidates, capturing and aggregating latent associations Demonstrate a way to quickly understand EE tasks from a
between arguments, a problem that has been overlooked by
medium-difficulty perspective and provide concepts and
most researchers. The results show a substantial
improvement over state-of-the-art methods on two datasets, definitions for EE task models and their applications. In this
namely ACE 2005 and TAC KBP 2015. paper, a review and summarization of a common issue with
the EE from a web page has been demonstrated. The new
VI. COMPARISON deep learning models which provide us with training models
After illustrating the recent papers, table I shows the ease the ways to extract the triggers and their arguments.
common criteria of those papers. The main difficulty to be faced is the lack of an annotated
Most of recently proposed systems implemented a deep- corpus in specific domains.
learning model to perform trigger and argument extraction.
These models showed a high accuracy compared to other
TABLE I. LIST OF EE RESEARCHES THAT SHOWS MODEL, DATASET, STRENGTHS, AND WEAKNESSES OF EACH RESEARCH.
2019 Hamborg, et al. Giveme5W1H 82% Special Dataset More accurate None
of English news results
articles
2021 Fisichella and Ceroni, Special Dataset of 70% Wikipedia More accurate None
Aware entity- articles results
primarily based event in a non-
enrichment annotated corpus.
algorithms and
temporal retrieval
162
Authorized licensed use limited to: Florida Institute of Technology. Downloaded on March 29,2023 at 11:22:10 UTC from IEEE Xplore. Restrictions apply.
8thInternational Conference on Contemporary Information Technology and Mathematics (ICCITM2022) , Mosul
University,Mosul- Iraq
2022 Dilek EneMonIE --- various forms offer a pluggable For data sources,
of media information only textual data
available on the extraction and other processing is
Web text processing intended
components,
2022 AliBalali Attention-based ACE 2005 and The results show a None
graph TAC KBP substantial
convolutional 2015. improvement over
network state-of-the-art
methods
163
Authorized licensed use limited to: Florida Institute of Technology. Downloaded on March 29,2023 at 11:22:10 UTC from IEEE Xplore. Restrictions apply.
8thInternational Conference on Contemporary Information Technology and Mathematics (ICCITM2022) , Mosul
University,Mosul- Iraq
164
Authorized licensed use limited to: Florida Institute of Technology. Downloaded on March 29,2023 at 11:22:10 UTC from IEEE Xplore. Restrictions apply.