0% found this document useful (0 votes)

34 views13 pages

Lecture07 03

Uploaded by

mengesha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views13 pages

Lecture07 03

Uploaded by

mengesha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Amharic Open Information Extraction

with Syntactic Sentence Simplification

Seble Girma(B) and Yaregal Assabie

Department of Computer Science, Addis Ababa University, Addis Ababa, Ethiopia

[email protected]

Abstract. Open Information Extraction (OIE) is the process of discovering

domain-independent relations from natural language text. It has recently received
increased attention and been applied extensively to various downstream applica-
tions, such as text summarization, question answering, and informational retrieval.
In this paper, we propose a method of OIE for Amharic language. To improve the
performance of relation extraction, the proposed OIE method implements a sen-
tence simplification technique that breaks down complex and compound sentences
into simple sentences. Linguistic rules are utilized to extract domain-independent
and unanticipated relation instances with their arguments from simple sentences.
The proposed method and algorithms are implemented and evaluated with a dataset
from different domains. Test results show that the system achieved an overall
precision of 0.88.

Keywords: Open Information Extraction · Chunking · Sentence Simplification ·

Relation Extraction

1 Introduction

Information Extraction (IE) is the task of automatically extracting structured information

from text. The core task of IE systems is to identify entities and relationships expressed
using natural languages. However, the traditional paradigm of IE requires either hand-
tagged training examples for each target relation or pre-specified relations along with
hand-crafted extraction rules as input. As such inputs are specific to the target domain,
shifting to a new domain requires extensive human involvement in creating new extrac-
tion patterns [1]. Thus, traditional IE systems are not portable across domains and do
not scale to massive and heterogeneous corpora like the Web where the relations are
unanticipated [2]. To overcome these limitations, Open Information Extraction (OIE)
has become more strongly suggested. OIE has made possible to process massive text cor-
pora without restriction to extract a certain type of relations and attributes, and without
having to require much human effort [3].
The OIE paradigm was introduced by Banko et al. [1] aiming to develop domain-
independent extractors of information by providing ways to extract unrestricted relational
information from text. OIE has several advantages over traditional IE approaches [4, 5].

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2021
Published by Springer Nature Switzerland AG 2021. All Rights Reserved
M. A. Delele et al. (Eds.): ICAST 2020, LNICST 384, pp. 457–469, 2021.
https://doi.org/10.1007/978-3-030-80621-7_33
458 S. Girma and Y. Assabie

It made easier to extract many kinds of relations without requiring manual labor to build
extraction rules and hand-tagged training examples. Because of its ability to extract
information for all relations at once without having them named explicitly, it also has a
significant scalability advantage over previous IE architectures. Traditional IE systems
usually search for entities that are associated with the type of relation which the system
was configured to extract whereas an OIE system tries to find relations as well as the
entities taking part in those relations which are not predefined. Traditional IE systems
require a specific pattern for each relation. On the other hand, OIE systems need a set
of patterns that are not related to any specific relation, and these features are useful to
extract relations of any nature.
In recent years, a variety of OIE systems have been developed. However, most
systems have been designed, implemented and evaluated predominantly for English
language. Amharic language, in a variety of respects, has different linguistic structures
from other languages like English. To the best of our knowledge, no research works
have been done on Amharic OIE yet. This paper presents an OIE for Amharic text
which extracts domain-independent relations from Amharic text.
The remaining part of this paper is organized as follows. Section 2 presents research
and development works in OIE. The proposed solution is presented in Sect. 3 and
experimental results are discussed in Sect. 4. Finally, we make our conclusion in Sect. 5.

2 Related Work
OIE is the task of extracting relations with their corresponding arguments from natural
language text. Some of OIE systems which are developed for different natural language
are reviewed below.
Etzioni et al. [1] introduced TextRunner, an OIE system that trained a Naïve Bayes
classifier with POS and NP-chunk features to extract relationships between entities. The
system has used a small set of handwritten rules to heuristically label training examples
from the Penn Treebank. TextRunner was evaluated using a test corpus of 9 million Web
documents and it obtained 7.8 million tuples. A set of 400 randomly selected tuples were
evaluated by human reviewers and 80.4% were considered correct.
Wu and Weld [6] presented an OIE system called WOE which used Wikipedia as a
source of training data. The WOE system generates relation-specific training examples
by matching Wikipedia Infobox content with corresponding patterns. WOE can learn
two kinds of extractor: WOEparse and WOEpos. WOEparse learns from dependency
path patterns, and WOEpos is trained with shallow features like POS tags. Comparing
with TextRunner, WOEpos runs at the same speed, but achieves an F-measure which
is between 18% and 34% greater. WOEparse achieves an F-measure which is between
72% and 91% higher than that of TextRunner but runs about thirty times slower due to
the time required for parsing.
Etzoni et al. [7] presented ReVerb, which aimed to prevent incoherent and unin-
formative extractions errors from TextRunner. To eliminate incoherent extractions and
to reduce uninformative extractions, the system is designed to capture relation phrases
expressed by a Verb-Noun combination that satisfies their pre-defined syntactic and lex-
ical constraint. It is reported that ReVerb achieves an AUC (area under Precision-Recall
curve) twice as big as TextRunner and WOEpos, and 38% greater than WOEparse.
Amharic Open Information Extraction with Syntactic Sentence Simplification 459

Aiming to improve OIE by covering a larger number of relation expressions and

expanding OIE representation to allow additional context information such as attribution
and clause modifiers, Mausam et al. [8] presented the system OLLIE. OLLIE uses the
output of OIE systems to bootstrap learning of the relation patterns and then additionally
applies lexical and semantic patterns to extract relations that are not expressed through
verb phrases. It is reported that OLLIE extracts up to 146 times as many extractions than
ReVerb and it obtains 1.9 to 2.7 times more area under precision yield curves compared
to ReVerb.
Del Corro and Gemulla [9] presented a clause-based approach implemented in
ClausIE. For each input sentence, ClausIE first computes the dependency parsing of
the sentence and then determines a set of clauses using the dependency parsing. Next,
for each clause, it determines the set of coherent derived clauses based on the depen-
dency parsing and finally it generates propositions from the coherent clauses. Hand-
crafted rules utilizing the dependency structure of a sentence are used. It is reported
that ClausIE achieves better precision than Reverb. ClauseIE’s accuracy relies on the
dependency parser used for parsing.
OIE systems for languages other than English also have been implemented. Gamallo
et al. [10] presented a multilingual system, named DepOE that uses the heuristic strategy
to perform unsupervised extraction of triples using a rule-based analyzer and dependency
parser to extract relations represented in English, Spanish, Portuguese, and Galician. The
authors reported that accuracy of 68% is reached, while ReVerb reaches 52% accuracy
for the same dataset.
Tseng et al. [11] presented a Chinese OIE called CORE that adopt existing Chinese
text analyzing approaches to identify the main relation in a given sentence. It is reported
that CORE yields relatively promising F1 scores than Reverb.
Nam et al. [3] presented a Korean OIE system called SRDF. The SRDF system is
designed to extract triples from Korean natural language text based on the use of singleton
property and other NLP techniques such as part-of-speech tagging and chunking. SRDF
enables extracting multiple numbers of triples from a single sentence via reification.
It is reported that the system achieves 81% precision, 86% recall, and 83% F-score for
detecting relation and 66% precision, 65% recall and 65% F-score for generating triples.
In summary, according to the experiment results reported thus far, the rule-based
systems achieve better accuracy than data-based systems. The results also show that
systems that are based on dependency analysis achieved significantly higher precision
and recall than those relying on shallow syntax. However, the shallow feature-based
approach is very promising in terms of speed, ease of implementation, and portability
to other languages. On the other hand, deep syntactic parsing methods are prone to slow
performance and their implementation is not easily available for many languages [8].
Moreover, because of heavy reliance on linguistic tools such as part-of-speech taggers
and dependency information as well as immediate lexical information to define patterns
or constraints for relations, it is difficult to directly apply the aforementioned methods
and techniques to low-resourced languages like Amharic. Moreover, the morphological
complexity of Amharic poses unique challenges in the development of NLP applications
in general and OIE in particular. Thus, in this work, we propose a design for Amharic
that takes the characteristics of the language into consideration.
460 S. Girma and Y. Assabie

3 The Proposed Solution

The proposed method for Amharic OIE consists of two main tasks: Sentence Simplifi-
cation and Relation Extraction. Sentence Simplification breaks complex and compound
sentences down into simple sentences from which relation instances are extracted. Ini-
tially, the input sentence is divided into non-overlapping phrases using phrasal chunking
which relies on POS and morphological tags of words. Then, the Sentence Simplification
component segments the sentence into a number of self-contained simple sentences that
are easier to process. Finally, relation instances are extracted in N-ary format from those
simplified sentences.

3.1 Sentence Simplification

The structure of senetences affects the effectiveness of relation extraction in texts. Simple
sentences have convenient structures for extracting relation instances. On the other hand,
complex and compound sentences pose difficulties in the process of relation extraction.
Thus, simplification of such sentences helps to improve the performance of OIE. In this
work, We have developed a set of simplification rules that segment and paraphrase the
input Amharic sentences and generate simpler sentences. Since the resulting sentences
might need further simplification, the process of syntactic simplification is structured in a
recursive loop. The syntactic simplification loop starts by checking if the input sentence
is a simple sentence which is identified by counting the number of verb phrases. If the
sentence contains exactly one verb phrase, the sentence will be classified as a simple
sentence. If two or more verb phrases joined by coordinate conjunctions are detected,
the sentence will be classified as a compound sentence. Otherwise, the sentence will be
classified as a complex sentence. The overall process of sentence simplification involves
clause splitting and paraphrasing.
Clause Splitting. Amharic clauses can be of two types: coordinate and subordinate. A
compound sentence is composed of two or more independent clauses joined by a coor-
dinating conjunction . Semi-
colon and comma can also function as conjunctions. On the other hand,
Amharic subordinate clauses contain both a subject and a verb, but do not express a
complete thought. In Amharic, subordinate clauses are recognizable by affixes attached
to the verb. Examples of Amharic subordinate clauses which are derived from the verb
are shown in Table 1. Thus, Amharic clause splitting comprises
of coordinate clause splitting and subordinate clause splitting. A given sentence is
iteratively split into clauses until all existing clauses are split.
Coordinate Clause Splitting. The coordinate conjunctions joining the clauses are
required to be detected and then the sentence is split at the conjunctions. For instance, the
following sentence contains three complete and independent clauses which are joined
by semicolons.
Amharic Open Information Extraction with Syntactic Sentence Simplification 461

Table 1. Examples of Amharic subordinate clauses

Before splitting the sentence, it is necessary to check if both parts of the sentence
are independent clauses. To be considered as an independent clause, they should at least
have one verb. Custom created POS tags and morphological information is used for
splitting into coordinate clauses. Table 2 shows tags that are created by combining POS
and Morphological information.

Table 2. Custom created POS tags and morphological information.

Algorithm 1 shows how compound sentences are split into coordinate clauses. The
algorithm accepts chunked compound sentences tagged with POS and morphological
information and returns a list of clauses. The algorithm first looks for coordinate con-
junction. If conjunction is detected, the sentence will be split there and the first part will
be checked if it contains a verb. If it contains a verb, the first part will be added to the
detected clause list and the remaining part of the sentence will be processed iteratively
using a similar procedure.

Input:Chunked sentences tagged with POS and morphological

information(S)
Output: list of clauses (CLAUSE_LIST)
Begin
Initialize INIT to Zero.
For each token T in S which is tagged as “CONJ”,
Substring INIT to index of T and assign into a variable CLAUSE.
Define a variable VERB_COUNT to the number of chunks in CLAUSE
which contain a verb type of (IV, AV, PV PAV, PPV, PGAV, GAV).
Set INIT to the index of T.
If VERB_COUNT >= 1
Add CLAUSE to CLAUSE_LIST
End If
End For
Output CLAUSE_LIST
End

Algorithm 1. Coordinate clause splitting

Subordinate Clause Splitting. In order to simplify complex sentences, the main clause
should be separated from the subordinate clause. Then, the main and the subordinate
clauses are transformed into independent sentences. To this effect, we have developed
an algorithm that takes chunked complex sentences tagged with POS and morphological
information as an input and returns a list of clauses. Since the algorithm takes chunked
sentences as input, the boundaries of chunks are used to identify subordinate clauses.
Algorithm 2 shows how complex sentences are split into clauses. The algorithm iterates
through all chunks of the sentence to look for a chunk containing a verb. If found, the
phrase will be marked as a subordinate clause and it will be removed from the main clause.
Both resulting clauses, i.e. main clause and subordinate clauses will be paraphrased to
generate independent sentences.

Input:Chunked sentences tagged with POS and morphological

information(S)
Output: list of clauses (CLAUSE_LIST)
Begin
For each Chunk C in S
If C contain a verb of type of (PRV, IRV, PPRV, IPRV)
Add C to CLAUSE_LIST
Remove C from S
End If
End For
Add S to CLAUSE_LIST
Output CLAUSE_LIST
End

Algorithm 2. Subordinate clause splitting

Amharic Open Information Extraction with Syntactic Sentence Simplification 463

Paraphrasing. In order to produce well-formed sentences from the individual clauses,

they should be paraphrased. Sentence paraphrasing is a process of rewriting a sentence
generated by clause splitting while preserving its meaning. The process includes rear-
ranging the order of words in a sentence and changing the verb form by removing affixes
of the verb. After splitting the subordinate clause from the main clause, affixes of the
verb of the subordinate clause will be removed and then the order of the phrases will be
rearranged based on the voice of the verb. For instance, to generate a simple sentence
from a relative clause, the algorithm first checks if the voice of the verb is passive or
active. If it is passive, the noun phrase found after the verb is a subject and a noun phrase
found before the verb is an object. If the verb has an active voice, the noun phrase found
before the verb will be a subject and a noun phrase found after the verb is an object.
Finally, a sentence will be formed by removing affixes of the verb and combining the
subject, object, and verb. Subordinate clauses might share the subject or object of the
main clause. Thus, the main clause needs to be paraphrased. For example, the following
sentence contains a relative clause where paraphrasing is made by taking the subject of
the main clause.

Sentence: [ለዓመታት በውጣ ውረዶች የተፈተነው የኢትዮጵያ ትራንስፖርት አሠሪዎች ፌዴሬሽን]

[ምሥረታ ] [ተከናወነ] ፡፡
Relative clause: ለዓመታት በውጣ ውረዶች የተፈተነው የኢትዮጵያ ትራንስፖርት አሠሪዎች ፌዴሬሽን
Main clause: የኢትዮጵያ ትራንስፖርት አሠሪዎች ፌዴሬሽን ምሥረታ ተከናወነ።
Paraphrased relative clause: የኢትዮጵያ ትራንስፖርት አሠሪዎች ለዓመታት በውጣ ውረዶች ተፈተነ።
Paraphrased main clause: የኢትዮጵያ ትራንስፖርት አሠሪዎች ፌዴሬሽን ምሥረታ ተከናወነ ፡፡

Algorithm 3 shows the implementation of the paraphrasing algorithm. The algorithm

takes a list of clauses as input and generates a list of well-formed sentences.
464 S. Girma and Y. Assabie

Input: List of POS and morphological tagged clauses (Cs )

Output: list of simple sentences (SENTENCE_LIST)
Begin
For each Clause C in Cs
If C is relative clause
Replace C in main clause by a noun phrase found after the verb
If C contain active verb
Set SUBJECT by a noun phrase found before the verb
Set OBJECT by a noun phrase found after the verb
End If
If C contain passive verb
Set SUBJECT by a noun phrase found before the verb
Set OBJECT by a noun phrase found after the verb
End If
Remove affixes from the verb
Concatenate SUBJECT, OBJECT and verb and add it to
SENTENCE_LIST
End If
If C is not relative clause
Remove affixes from the verb
Concatenate C with all chunks found before C and add it to
SENTENCE_LIST
End If
End For
Add main clause to SENTENCE_LIST
Output SENTENCE_LIST
End

Algorithm 3. Paraphrasing algorithm

3.2 Relation Extraction

The main goal of an OIE system is to extract arbitrary relations with their corresponding
arguments from natural language text. In this work, we identify and extract four relation
types from a give sentence: verb-based relation, has-relation, is-a relation and noun-
mediated relation.

Verb-based Relation Extraction. Verb-based relations

can be represented in predicate-argument structure as Rel (Arg1, Arg2) where Rel is
the verb of the sentence, Arg1 is the subject of the sentence and Arg2 is indirect object,
direct object, complement, or adverb. Since Amharic sentence has a subject-object-verb
structure, verb-based relations can be detected by the presence of a verb at the end of
the sentence. Algorithm 4 shows the verb-based relation extraction algorithm. Consider
the following example.

The first chunk is the first argument and the verb is the relation phrase and
the other phrases are the second argument of the relation. Accordingly, the extracted
relations are:
Amharic Open Information Extraction with Syntactic Sentence Simplification 465

INPUT:Chunked sentences tagged with POS and morphological

information(S)
OUTPUT: relation tuples in predicate argument structure
BEGIN
For each chunk C in S.
IF C contain a verb
Predicate = C
End IF
IF C is the first chunk and it is a noun phrase
Argument1 = C
END IF
IF C does not contain a verb
Add C to ArgumentList
END IF
END FOR
output the relation in form of predicate (Argument1,
ArgumentList))
END

Algorithm 4. Verb-based relation extraction

HAS Relation Extraction. HAS relation expresses possession or ownership in a sen-

tencce. In Amharic sentence, HAS relation is implicitly expressed between two consec-
utive nouns when the first noun is in genitive case. The following examples show HAS
relations in Amharic.

የአበበ ልጅ ፡ HAS (አበበ ,ልጅ ).

የአለም ጓደኛ: HAS (አለም, ጓደኛ ).

Algorithm 5 shows the implementation of the HAS relation extraction. The algorithm
takes noun phrases tagged with POS and morphological information. The algorithm
checks if a noun in genitive case followed by another noun is found in the input noun
phrase. If found, the word itself will be the first argument and words found after it is the
second argument.

INPUT:Noun phrases tagged with POS and morphological information (NP)

OUTPUT: HAS relation tuple in predicate argument structure
BEGIN
IF a noun in genitive case, followed by other noun is found.
Argument1 = the noun itself
Argument2 = the next noun
END IF
output the predicate in form of HAS (Argument1, Argument2))
END

Algorithm 5. HAS relation extraction

IS-A Relation Extraction. IS-A relation is an implicitly expressed relation between a

proper noun and a common noun. In Amharic, this kind of relation is found when a
proper noun comes after a common noun. The following example shows IS-A relation.

የኢትዮጲያ ሯጭ ሃይሌ ገብረስላሴ: IS-A (ሃይሌ ገብረስላሴ , ሯጭ).

The implementation of IS-A relation extraction is shown in Algorithm 6. The algo-

rithm takes noun phrases tagged with POS and morphological information as an input
466 S. Girma and Y. Assabie

and returns a list of IS-A relations. By iterating to each word in the text, it looks for an
agent noun which is followed by another noun. If found, the words found after the agent
noun is extracted as the first argument and the agent noun will be extracted as the second
argument.

INPUT:Noun phrases tagged with POS and morphological information (NP)

OUTPUT: IS-A relation tuple in predicate argument structure
BEGIN
If an agent noun followed by a noun is found
Argument1 = noun phrase found after the agent noun
Argument2 = the agent noun
END If
output the predicate in form of IS (Argument1, Argument2))
END

Algorithm 6. IS-A relation extraction

Noun-Mediated Relation Extraction. Noun-mediated relation expresses a binary
relation between two nouns. In Amharic sentence, a common noun found between two
proper nouns indicates the presence of a noun-mediated relation between two the proper
nouns. For example, from a noun phrase የኢትዮጵያ ሯጭ ሃይሌ ገብረስላሴ, a common noun
ሯጭ is a relation between ኢትዮጵያ and ሃይሌ ገብረስላሴ . It can be represented in predicate-
argument structure as: ሯጭ (ሃይሌ ገብረስላሴ, ኢትዮጵያ) . Agent nouns are used to detect noun-
mediated relations. The implementation of noun-mediated relation extraction is demon-
strated in Algorithm 7. The algorithm takes noun phrases tagged with POS and morpho-
logical information as input. The algorithm first looks for an agent noun that is found
between two nouns. If found, the noun found after the agent noun is extracted as the first
argument, the noun found before the agent noun is extracted as the second argument,
and the agent noun will be extracted as the predicate.

INPUT: Noun phrases tagged with POS and morphological information (NP)
OUTPUT: noun-medicated relation tuple in Predicate-Argument structure
BEGIN
IF an agent noun is found between two nouns
Predicate = agent noun
Argument1 = noun phrase found after the agent noun
Argument2 = noun phrase found before the agent noun
END IF
output the predicate in form of predicate (Argument1, Argument2)
END

Algorithm 7. Noun-mediated relation extraction algorithm

4 Experiment
4.1 Dataset Collection
To test the performance of the system, text corpus was collected from online Amharic
news sources such as The Reporter Ethiopia1 and Walta Media and Communication
1 https://www.ethiopianreporter.com.
Amharic Open Information Extraction with Syntactic Sentence Simplification 467

Corporate2 . The corpus was collected randomly from different domain areas to study the
sensitivity of the proposed system to variation in domain. In order to create an annotated
dataset for extraction, each sentence is tagged with POS, morphological information
and relation. A total of 215 tagged sentences are used for testing the system. From these
sentences, we annotated 768 relations that consists of 414 verb-based relations, 207 HAS
relations, 78 noun-mediated relations and 69 IS relations.

4.2 Test Result

A system implementing the algorithms was developed and used to evaluate the proposed
approach, The collected corpus was used to test the performance of the system. Test
results are shown in Table 3.

Table 3. Test result

Relation Ground truth Extraction result

Total extracted Correctly extracted Precision
Verb-Based 414 310 286 0.92
HAS 207 140 128 0.91
IS 69 50 39 0.78
Noun-Mediated 78 56 41 0.73
TOTAL 768 556 494 0.88

4.3 Discussion

Test results show that the proposed system has extracted 556 instances of relations from
215 sentences with a precision of 88%. Considering the errors made by the system, in
general, we found that 56.5% of errors are due to complex and malformed sentences.
Although the performance of AOIE can be significantly improved by simplifying com-
plex sentences, the system achieves only 73% accuracy in sentence simplification. After
a thorough analysis of each error returned by the sentence simplification component on
the test dataset, most of the errors are due to failures in simplifying highly complex sen-
tences. The sentence simplification algorithm has shortcomings in handling sentences
that contain one or more clauses that share the subject or objects with the main clause,
and/or contain other embedded clauses. This limitation often leads to incorrect and
over-specified predicates and arguments, missed relation instances, and it also produces
relation instances that are inconsistent with the information contained in the original
sentence. For instance, consider the following sentence.

2 http://www.waltainfo.com.
468 S. Girma and Y. Assabie

በሻኪሶ ከተማ አካባቢ ነዋሪዎች በተነሳ ተቃውሞ ምክንያት ሥራውን እንዲያቋርጥ በተደረገው
የሚድሮክ ወርቅ ኩባንያ ንብረት በሆነው የለገደንቢ ወርቅ ማምረቻ ላይ የካናዳ ከፍተኛ
ባለሙያዎች ጥናት ሊያካሂዱ እንደሆነ ታወቀ፡፡

The relation generated by the system is: ሥራውን እንዲያቋርጥ ጥናት ሊያካሂዱ እንደሆነ ታወቀ
(Arg1: "በሻኪሶ ከተማ አካባቢ ነዋሪዎች", Arg2: "በተነሳ ተቃውሞ ምክንያት ")”. The first argument of
this extraction is incorrect, it should be “የካናዳ ከፍተኛ ባለሙያዎች” . It also contains an over-
specified predicate. There is also one missed
relation: ነው (Arg1: “የለገደንቢ ወርቅ ማምረቻ”, Arg2: “የሚድሮክ ወርቅ ኩባንያ”, Arg3: “ንብረት”) .
Moreover, 32.7% of the errors are due to errors in morphological analysis and POS
tagging. For instance, from the sen-
tence “በዓድዋ ጦርነት የተሰው ጀግኖችን አፅም በክብር ለማሳረፍ ታሰበ:” , two relation instances were
expected from the system: ታሰበ (Arg1: “ጀግኖችን”, Arg2: “አፅም”, Arg3: “በክብር ለማሳረፍ”)
and ለማሳረፍ”) and ተሰው (“ጀግኖች”, “በዓድዋ ጦርነት”) . However, since the morphological ana-
lyzer did not label the “የተሰው” as a relative verb, the relative clause is not detected
and the sentence couldn’t be simplified. As a result of this, only one instance
of relation which contains an over-specified argument is extracted by the system:
“ታሰበ (Arg1:" በዓድዋ ጦርነት የተሰው ጀግኖችን ", Arg2: "አፅም", Arg3: "በክብር ለማሳረፍ ").
The remaining types of errors made by the system are due to erroneously chun-
ked phrases. Incorrectly chunking the sentence often leads to incorrect predicate or
arguments. For instance, the sentence: "የኢትዮጵያ አየር መንገድ በረራዎች ለሁለት ሰዓታት ተቋረጡ”
is chunked erroneously as: [የኢትዮጵያ አየር] [መንገድ በረራዎች] [ለሁለት ሰዓታት] [ተቋረጡ]. Thus,
incorrect relation is
extracted instead of: ተቋረጡ (Arg1: " የኢትዮጵያ አየር መንገድ በረራዎች", Arg2:-"ለሁለት ሰዓታት").

5 Conclusion

In this paper, we present Amharic OIE system. It is understood that rule-based approaches
operating on deep parsed sentences yield the most promising results for OIE systems, as
they enable higher precision. However, Amharic has limited tools and resources making
it difficult to apply deep dependency parser. As a result, we have implemented a rule-
based Amharic OIE system that operates on shallow parsed sentences. To minimize the
difficulty of relation extraction from shallow parsed sentences, we introduced a sentence
simplification component that converts complex and compound sentences into a list
of simple sentences without losing the overall semantics of the original sentence. Our
experimental results have shown that sentence simplification minimizes the reliance of
relation extraction on deep parsed sentences. This indicates that the performance of
system can be further improved by enhancing the capability of sentence simplification
component.

References
1. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information
extraction from the web. IJCAI 2007, 2670–2676 (2007)
Amharic Open Information Extraction with Syntactic Sentence Simplification 469

2. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In:
Proceedings of the Conference on EMNLP, pp. 1535–1545. Association for Computational
Linguistics, Stroudsburg (2011)
3. Sangha, N., Younggyun, N., Sejin, N., Key-Sun, C.: SRDF: korean open information extrac-
tion using singleton property. In: Proceedings of the 14th International Semantic Web
Conference (2015)
4. Mausam M.: Open information extraction systems and downstream applications. In: Pro-
ceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI
2016. AAAI Press: 2, pp. 4074–4077 (2016)
5. Abreu, S.C., Bonamigo, T.L., Vieira, R.: A review on relation extraction with an eye on
portuguese. J. Braz. Comput. Soc. 19, 553–571 (2013)
6. Wu, F., Weld, D.S.: Open Information Extraction using Wikipedia. In: The 48th Annual
Meeting of the Association for Computational Linguistics, Uppsala, Sweden (2010)
7. Etzioni, O., Fader, A., Christensen, J., Soderland, S.: 2011. open information extraction: the
second generation. In: Proceedings of the 22nd international joint conference on Artificial
Intelligence IJCAI 2011, pp. 3–10, Barcelona, Catalonia, Spain, 16–22 July 2011
8. Mausam, S.M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information
extraction. In: EMNLP-CoNLL, pp. 523–534 (2012)
9. Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceed-
ings of the 22nd International Conference on World Wide Web, WWW 2013, pp. 355–366.
ACM, New York (2013)
10. Gamallo, P., Garcia, M., Fernández-Lanza, S.: Dependency-based open information extrac-
tion. In: ROBUS-UNSUP Workshop at EACL-2012, Avignon, France (2012)
11. Tseng, Y.-H., et al.: Chinese open relation extraction for knowledge acquisition. In: EACL,
pp 12–16 (2014)

A Review of Open Information Extraction
No ratings yet
A Review of Open Information Extraction
9 pages
Open Information Extraction Paradigm
No ratings yet
Open Information Extraction Paradigm
7 pages
A Machine Learning Approach To Information Extraction
No ratings yet
A Machine Learning Approach To Information Extraction
8 pages
Machine Learning for Informal IE
No ratings yet
Machine Learning for Informal IE
34 pages
Is WC 06 Welty Murdock
No ratings yet
Is WC 06 Welty Murdock
14 pages
English7 Q3 W1 D4
No ratings yet
English7 Q3 W1 D4
44 pages
Information Extraction
No ratings yet
Information Extraction
8 pages
PDF 11
No ratings yet
PDF 11
10 pages
Open IE for NLP and Logic Tasks
No ratings yet
Open IE for NLP and Logic Tasks
11 pages
Identifying Relations For Open Information Extraction - 2011
No ratings yet
Identifying Relations For Open Information Extraction - 2011
11 pages
Piskorski 2012
No ratings yet
Piskorski 2012
27 pages
A Machine Learning Approach To Information Extract
No ratings yet
A Machine Learning Approach To Information Extract
10 pages
Grammatical Inference for Information Extraction
No ratings yet
Grammatical Inference for Information Extraction
4 pages
NLP Proposal
No ratings yet
NLP Proposal
2 pages
TXTM C1 Text Mining
No ratings yet
TXTM C1 Text Mining
36 pages
Applsci 13 04636 v3
No ratings yet
Applsci 13 04636 v3
16 pages
A Survey On Hidden Markov Models For Information Extraction
No ratings yet
A Survey On Hidden Markov Models For Information Extraction
4 pages
Advanced Info Extraction Methods
No ratings yet
Advanced Info Extraction Methods
20 pages
Aplicacion de Tecnicas de Extraccion de Informacion A Bibliotecas Digitales Applying Information Extraction Techniques To Dls 0
No ratings yet
Aplicacion de Tecnicas de Extraccion de Informacion A Bibliotecas Digitales Applying Information Extraction Techniques To Dls 0
10 pages
Building Information Extraction System Based On Computing Domain Ontology
No ratings yet
Building Information Extraction System Based On Computing Domain Ontology
5 pages
D2.1.1 Ontology-Based Information Extraction (OBIE) v.1
No ratings yet
D2.1.1 Ontology-Based Information Extraction (OBIE) v.1
37 pages
Extracting Information Science Concepts
No ratings yet
Extracting Information Science Concepts
8 pages
JournalNX-Information Extraction
No ratings yet
JournalNX-Information Extraction
6 pages
Unit 4 DNLP
No ratings yet
Unit 4 DNLP
52 pages
Unit4 Final
No ratings yet
Unit4 Final
57 pages
Deep Learning Advances in Relation Extraction
No ratings yet
Deep Learning Advances in Relation Extraction
34 pages
Data Mining
No ratings yet
Data Mining
84 pages
A Methodology To Create Ontology-Based Information Retrieval Systems
No ratings yet
A Methodology To Create Ontology-Based Information Retrieval Systems
11 pages
NLP MiniProject GroupNo 16
No ratings yet
NLP MiniProject GroupNo 16
9 pages
Information Extraction Survey
No ratings yet
Information Extraction Survey
117 pages
Introduction To Information Extraction Technology: Douglas E. Appelt David J. Israel
No ratings yet
Introduction To Information Extraction Technology: Douglas E. Appelt David J. Israel
41 pages
Learning T-Wrappers For Information Extraction
No ratings yet
Learning T-Wrappers For Information Extraction
10 pages
Unit5 NLP RNP
No ratings yet
Unit5 NLP RNP
112 pages
PortNOIE - A Neural Framework For Open Information Extraction For The Portuguese Language (Cabral Et Al., 2022)
No ratings yet
PortNOIE - A Neural Framework For Open Information Extraction For The Portuguese Language (Cabral Et Al., 2022)
13 pages
QA Review: IR-based Question Answering
No ratings yet
QA Review: IR-based Question Answering
11 pages
NLTK Analysis 5
No ratings yet
NLTK Analysis 5
5 pages
Incremental Information Extraction Framework
No ratings yet
Incremental Information Extraction Framework
4 pages
Automatic Question Answering
No ratings yet
Automatic Question Answering
10 pages
Natural Language Processing: Mature Enough For Requirements Documents Analysis?
No ratings yet
Natural Language Processing: Mature Enough For Requirements Documents Analysis?
12 pages
GATE: Semantic Text Analysis Overview
No ratings yet
GATE: Semantic Text Analysis Overview
165 pages
Mining Data Records Based On Ontology Evolution For Deep Web
No ratings yet
Mining Data Records Based On Ontology Evolution For Deep Web
4 pages
Tasks in NLP
No ratings yet
Tasks in NLP
7 pages
Overview of Information Extraction in NLP
No ratings yet
Overview of Information Extraction in NLP
25 pages
Rexuie: A Recursive Method With Explicit Schema Instructor For Universal Information Extraction
No ratings yet
Rexuie: A Recursive Method With Explicit Schema Instructor For Universal Information Extraction
18 pages
Info Extraction Techniques Analysis
No ratings yet
Info Extraction Techniques Analysis
9 pages
A Few-Shot Approach For Relation Extraction Domain
No ratings yet
A Few-Shot Approach For Relation Extraction Domain
9 pages
Automatising Ruiz-Casado DKE 2007 Ps
No ratings yet
Automatising Ruiz-Casado DKE 2007 Ps
26 pages
Chapter #7 Applicatios of NLP (Reading Ass)
No ratings yet
Chapter #7 Applicatios of NLP (Reading Ass)
58 pages
Knowledge Extraction From Natural Language Text in The Model-Driven Engineering
No ratings yet
Knowledge Extraction From Natural Language Text in The Model-Driven Engineering
12 pages
Information Extraction From Unstructured
No ratings yet
Information Extraction From Unstructured
9 pages
A Proposal For A Web Information Extraction and Question-Answer System
No ratings yet
A Proposal For A Web Information Extraction and Question-Answer System
7 pages
Concept Relation Extraction Using Naı Ve Bayes Classifier For Ontology-Based Question Answering Systems
No ratings yet
Concept Relation Extraction Using Naı Ve Bayes Classifier For Ontology-Based Question Answering Systems
12 pages
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
No ratings yet
FALLSEM2023-24 CSE4022 ETH VL2023240103739 2023-08-23 Reference-Material-II
5 pages
Part2b IE
No ratings yet
Part2b IE
66 pages
A Language Independent Approach To Multilingual Text Summarization
No ratings yet
A Language Independent Approach To Multilingual Text Summarization
10 pages
Recent Survey On Automatic Ontology Learning
No ratings yet
Recent Survey On Automatic Ontology Learning
5 pages
OCR++: Framework for Extracting Scholarly Data
No ratings yet
OCR++: Framework for Extracting Scholarly Data
9 pages
Cha 4
No ratings yet
Cha 4
5 pages
Evaluating Large Language Models in Class Level Code Generation
No ratings yet
Evaluating Large Language Models in Class Level Code Generation
13 pages
CCNP Exam Description and Title
No ratings yet
CCNP Exam Description and Title
5 pages
Lecture07 02
No ratings yet
Lecture07 02
8 pages
Chapter 1 3 Short Note
No ratings yet
Chapter 1 3 Short Note
12 pages
Cha 2
No ratings yet
Cha 2
8 pages
Lecture03 03
No ratings yet
Lecture03 03
7 pages
Interpreter Design Pattern
No ratings yet
Interpreter Design Pattern
15 pages
Employee and Student Database Schema
No ratings yet
Employee and Student Database Schema
2 pages
AAA ISE Support Document
No ratings yet
AAA ISE Support Document
1 page
Morphological Analysis in NLP
No ratings yet
Morphological Analysis in NLP
44 pages
Understanding 2QS in Qualitative Research
No ratings yet
Understanding 2QS in Qualitative Research
26 pages
Cartilla 4°año 2024 IRL Taller de Inglés
No ratings yet
Cartilla 4°año 2024 IRL Taller de Inglés
41 pages
Adjunct, Conjunct and Disjunct: Ex: (If at All Possible) I'll See You (Tomorrow) (After The Show) (Outside The Main
No ratings yet
Adjunct, Conjunct and Disjunct: Ex: (If at All Possible) I'll See You (Tomorrow) (After The Show) (Outside The Main
5 pages
Gerunds and Infinitives Part 1
No ratings yet
Gerunds and Infinitives Part 1
3 pages
Types of Sentence: Worksheet
No ratings yet
Types of Sentence: Worksheet
1 page
Something That Is Likely To Happen in The Present or The Future
No ratings yet
Something That Is Likely To Happen in The Present or The Future
15 pages
Verbs of Motion
No ratings yet
Verbs of Motion
32 pages
22ent21-Professional English
No ratings yet
22ent21-Professional English
2 pages
3 - Teaching Language Skills
No ratings yet
3 - Teaching Language Skills
214 pages
Unit 1 Sažetak Gradiva
No ratings yet
Unit 1 Sažetak Gradiva
8 pages
KS1 English: and Spelling
No ratings yet
KS1 English: and Spelling
68 pages
Analisis Semantik Leksikal Novel Sangkar
No ratings yet
Analisis Semantik Leksikal Novel Sangkar
20 pages
Korean Adjectives
100% (1)
Korean Adjectives
9 pages
FCE Student's Booklet 2014
No ratings yet
FCE Student's Booklet 2014
53 pages
Year 3: Understanding Suffix -ly
No ratings yet
Year 3: Understanding Suffix -ly
11 pages
Writing Rubric For Beginners
No ratings yet
Writing Rubric For Beginners
1 page
Quite and Pretty
No ratings yet
Quite and Pretty
8 pages
Edilgen Yapı Rehberi
No ratings yet
Edilgen Yapı Rehberi
4 pages
English Grammar Exercises
No ratings yet
English Grammar Exercises
4 pages
Action and Non-Action Verbs Guide
No ratings yet
Action and Non-Action Verbs Guide
9 pages
‎⁨أوراق عمل انجليزي 1 2 1ث ف2 موقع مادتي⁩
No ratings yet
‎⁨أوراق عمل انجليزي 1 2 1ث ف2 موقع مادتي⁩
23 pages
Phrasal Verbs
No ratings yet
Phrasal Verbs
7 pages
Proficiency Exam Prep Guide
No ratings yet
Proficiency Exam Prep Guide
15 pages
Final - English HSSC-II - Merged
No ratings yet
Final - English HSSC-II - Merged
10 pages
Chapter 1 - Sentence Structure - Constituents (Summary)
No ratings yet
Chapter 1 - Sentence Structure - Constituents (Summary)
6 pages
Grammar Files b2 Unit 1
100% (1)
Grammar Files b2 Unit 1
7 pages
Class 8 English Grammar Ncert Solutions The Adjective
No ratings yet
Class 8 English Grammar Ncert Solutions The Adjective
11 pages
A Copious Greek Grammar Vol. 1
100% (2)
A Copious Greek Grammar Vol. 1
808 pages
101 English Games
100% (2)
101 English Games
56 pages
Overview of English Verb Tenses
No ratings yet
Overview of English Verb Tenses
5 pages
2.1 Building Communications Proficiency
No ratings yet
2.1 Building Communications Proficiency
102 pages

Lecture07 03

Uploaded by

Lecture07 03

Uploaded by

Amharic Open Information Extraction

with Syntactic Sentence Simplification

Seble Girma(B) and Yaregal Assabie

Department of Computer Science, Addis Ababa University, Addis Ababa, Ethiopia

Abstract. Open Information Extraction (OIE) is the process of discovering

Keywords: Open Information Extraction · Chunking · Sentence Simplification ·

Information Extraction (IE) is the task of automatically extracting structured information

Aiming to improve OIE by covering a larger number of relation expressions and

3 The Proposed Solution

3.1 Sentence Simplification

Table 1. Examples of Amharic subordinate clauses

Table 2. Custom created POS tags and morphological information.

Tags POS Morphology

Input:Chunked sentences tagged with POS and morphological

Algorithm 1. Coordinate clause splitting

Input:Chunked sentences tagged with POS and morphological

Algorithm 2. Subordinate clause splitting

Paraphrasing. In order to produce well-formed sentences from the individual clauses,

Sentence: [ለዓመታት በውጣ ውረዶች የተፈተነው የኢትዮጵያ ትራንስፖርት አሠሪዎች ፌዴሬሽን]

Algorithm 3 shows the implementation of the paraphrasing algorithm. The algorithm

Input: List of POS and morphological tagged clauses (Cs )

Algorithm 3. Paraphrasing algorithm

3.2 Relation Extraction

Verb-based Relation Extraction. Verb-based relations

INPUT:Chunked sentences tagged with POS and morphological

Algorithm 4. Verb-based relation extraction

HAS Relation Extraction. HAS relation expresses possession or ownership in a sen-

የአበበ ልጅ ፡ HAS (አበበ ,ልጅ ).

INPUT:Noun phrases tagged with POS and morphological information (NP)

Algorithm 5. HAS relation extraction

IS-A Relation Extraction. IS-A relation is an implicitly expressed relation between a

የኢትዮጲያ ሯጭ ሃይሌ ገብረስላሴ: IS-A (ሃይሌ ገብረስላሴ , ሯጭ).

The implementation of IS-A relation extraction is shown in Algorithm 6. The algo-

INPUT:Noun phrases tagged with POS and morphological information (NP)

Algorithm 6. IS-A relation extraction

Algorithm 7. Noun-mediated relation extraction algorithm

4.2 Test Result

Table 3. Test result

Relation Ground truth Extraction result

You might also like