0% found this document useful (0 votes)
21 views6 pages

Enhancing Knowledge Graph Construction Using v2

This paper explores the integration of Large Language Models (LLMs) with semantic technologies to enhance the construction of Knowledge Graphs, particularly in the context of sustainability. Through experiments using models like ChatGPT and REBEL, the authors demonstrate that advanced LLMs can significantly improve the accuracy and relevance of Knowledge Graphs generated from unstructured text. The study highlights the potential for automatic ontology creation and the importance of detailed prompts to optimize the extraction of meaningful relationships from raw data.

Uploaded by

chaimaeelhmami20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Enhancing Knowledge Graph Construction Using v2

This paper explores the integration of Large Language Models (LLMs) with semantic technologies to enhance the construction of Knowledge Graphs, particularly in the context of sustainability. Through experiments using models like ChatGPT and REBEL, the authors demonstrate that advanced LLMs can significantly improve the accuracy and relevance of Knowledge Graphs generated from unstructured text. The study highlights the potential for automatic ontology creation and the importance of detailed prompts to optimize the extraction of meaningful relationships from raw data.

Uploaded by

chaimaeelhmami20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Enhancing Knowledge Graph Construction Using

Large Language Models


1st Milena Trajanoska 2nd Riste Stojanov 3rd Dimitar Trajanov
Faculty of Comp. Sci. and Eng. Faculty of Comp. Sci. and Eng. Faculty of Comp. Sci. and Eng.
Ss. Cyril and Methodius University Ss. Cyril and Methodius University Ss. Cyril and Methodius University
Skopje, Macedonia Skopje, Macedonia Skopje, Macedonia
[Link]@[Link] [Link]@[Link] [Link]@[Link]
ORCID: 0000-0003-0105-7693 ORCID: 0000-0003-2067-3467 ORCID: 0000-0002-3105-6010
arXiv:2305.04676v1 [[Link]] 8 May 2023

Abstract—The growing trend of Large Language Models extracted from the texts and applied for intelligent reasoning.
(LLM) development has attracted significant attention, with mod- This fact has motivated us to use some of the state-of-the-art
els for various applications emerging consistently. However, the models in an attempt to extract information from text data on
combined application of Large Language Models with semantic
technologies for reasoning and inference is still a challenging task. the Web.
This paper analyzes how the current advances in foundational Yet, creating Knowledge Graphs from raw text data is a
LLM, like ChatGPT, can be compared with the specialized complex task that requires advanced NLP techniques such as
pretrained models, like REBEL, for joint entity and relation Named Entity Recognition [3], Relation Extraction [4], and
extraction. To evaluate this approach, we conducted several Semantic Parsing [5]. Large language models such as GPT-3
experiments using sustainability-related text as our use case. We
created pipelines for the automatic creation of Knowledge Graphs [6], T5 [7], and BERT [8] have shown remarkable performance
from raw texts, and our findings indicate that using advanced in these tasks, and their use has resulted in significant improve-
LLM models can improve the accuracy of the process of creating ments in the quality and accuracy of knowledge graphs.
these graphs from unstructured text. Furthermore, we explored To evaluate our approach in connecting both fields, we chose
the potential of automatic ontology creation using foundation to analyze the specific use case of sustainability. Sustainability
LLM models, which resulted in even more relevant and accurate
knowledge graphs. is a topic of great importance for our future, and a lot
Index Terms—ChatGPT, REBEL, LLMs, Relation-extraction, of emphasis has been placed on identifying ways to create
NLP, Sustainability more sustainable practices in organizations. Sustainability has
become the norm for organizations in developed countries,
I. I NTRODUCTION mainly due to the rising awareness of their consumers and
The technological advancements, together with the avail- employees. However, this situation is not reflected in devel-
ability of Big Data, have led to a surge in the development of oping and underdeveloped countries to this extent. Although
Large Language Models (LLMs) [1]. This trend has paved the the perception of sustainability has improved, progress toward
way for a cascade of new models being released on a regular sustainable development has been slower, indicating the need
basis, each outperforming its predecessors. These models for more concrete guidance [9]. Moreover, theoretical research
have started a revolution in the field with their capability has attempted to link strategic management and sustainable
to process massive amounts of unstructured text data and by development in corporations in order to encourage the inte-
achieving state-of-the-art results on multiple Natural Language gration of sustainability issues into corporate activities and
Processing (NLP) tasks. strategies [10]. Even though research has set a basis for
However, one of the aspects which have not yet taken over developing standards and policies in favor of sustainability, a
the spotlight is the combined application of these models with more empirical approach is needed for policy definitions and
semantic technologies to enable reasoning and inference. This analyzing an organization’s sustainability level with respect to
paper attempts to fill this gap by making a connection between the defined policies.
the Deep Learning (DL) space and the semantic space, through In this study, the goal is to make a connection between
the use of NLP for creating Knowledge Graphs [2]. LLMs and semantic reasoning to automatically generate a
Knowledge Graphs are structured representations of in- Knowledge Graph on the topic of sustainability and populate
formation that capture the relationships between entities in it with concrete instances using news articles available on the
a particular domain. They are used extensively in various Web. For this purpose, we create multiple experiments where
applications, such as search engines, recommendation systems, we utilize popular NLP models, namely Relation Extraction
and question-answering systems. By End-to-end Language generation (REBEL) [11] and Chat-
On a related note, there is a significant amount of raw GPT [12]. We show that although REBEL is specifically
texts available on the Web which contain valuable information. trained for relation extraction, ChatGPT, a conversational
Nevertheless, this information is unusable if it cannot be agent using a generative model, can streamline the process
of automatically creating accurate Knowledge Graphs from Knowledge Base. The agent consists of three steps, including
an unstructured text when provided with detailed instructions. separate models: a supervised fine-tuning (SFT) model based
The rest of the paper is structured as follows: Section on GPT-3 [6], a reward model, and a reinforcement learning
II presents a brief literature overview, Section III describes model.
the methods and experimental setup, Section IV outlines the ChatGPT was trained using Reinforcement Learning from
results of the information extraction process, Section V states Human Feedback (RLHF) [15], employing methods similar to
the propositions for future work, and finally section VI gives InstructGPT with minor variations in data collection. An initial
the conclusion of the work done in this paper. model is trained through supervised fine-tuning, with human
AI trainers engaging in conversations, assuming both user
II. L ITERATURE R EVIEW
and AI assistant roles. To aid in formulating responses, train-
A. Algorithms ers were given access to model-generated suggestions. The
Our study focuses on the task of information extraction from newly created dialogue dataset was then combined with the
news and reports available on the Web. For this purpose, we InstructGPT dataset, which was transformed into a dialogue
compare the capabilities of NLP models to generate a useful format. In order to establish a reward model for reinforcement
Knowledge Base on the topic. learning, comparison data needed to be gathered, consisting
A Knowledge Base represents information stored in a struc- of two or more model responses ranked by quality. This data
tured format, ready to be used for analysis or inference. Often, was collected by taking conversations between AI trainers and
Knowledge Bases are stored in the form of a graph and are the chatbot, randomly selecting a model-generated message,
then called Knowledge Graphs. sampling multiple alternative completions, and having AI
In order to create such a Knowledge Base, we need to trainers rank them. The reward models enabled fine-tuning of
extract information from the raw texts in a triplet format. An ChatGPT using Proximal Policy Optimization [16], and several
example of a triplet would be <Person, Location, City>. In iterations of this procedure were executed.
the triplet, we have a structure consisting of the following links
Entity -> Relation -> Entity, where the first entity is referred B. Use case: Sustainability
to as the subject, the relation is a predicate, and the second
entity represents the object. In order to achieve this structured The Global sustainability study of 2022 has reported that
information extraction, we need to identify entities in the raw 71% out of 11,500 surveyed consumers around the world are
texts, as well as the relations connecting these entities. making changes to the way they live and the products they
In the past, this process was implemented by leveraging buy in an effort to live more sustainably [17]. This shows that
multi-step pipelines, where one step included Named-entity corporations not only need to change their operations to be
Recognition (NER) [3], and another step was Relation classi- more sustainable for the sake of the environment but also to
fication (RC) [13]. However, these multi-step pipelines often be able to stay competitive.
prove to have unsatisfactory performance due to the propaga- With the vast amount of unstructured data available on
tion of errors from the steps. In order to tackle this problem, the Web, it is crucial to develop methods that can automat-
end-to-end approaches have been implemented, referred to as ically identify sustainability-related information from news,
Relation-Extraction (RE) [4] methods. reports, papers, and other forms of documents. One such study
One of the models utilized in this study is REBEL (Relation identifies this opportunity and attempts to create a method
Extraction By End-to-end Language generation) [11], which for directly extracting non-financial information generated by
is an auto-regressive seq2seq model based on BART [14] that various media to provide objective ESG information [18].
performs end-to-end relation extraction for more than 200 The authors have trained an ESG classifier and recorded a
different relation types. The model achieves 74 micro-F1 and classification accuracy of 86.66% on 4-class on texts which
51 macro-F1 scores. It was created for the purpose of joint they manually labeled. On a related note, researchers have
entity-relation extraction. taken a step further to extract useful ESG information from
REBEL is a generative seq2seq model which attempts to texts. In this article [19], the authors have trained a joint entity
”translate” the raw text into a triple format. The REBEL model and relation extraction model on a private dataset consisting of
outputs additional tokens, which are used during its training ESG and CSR reports annotated internally at Crédit Agricole.
to identify a triplet. These tokens include <triplet>, which They were able to identify entities such as coal activities and
represents the beginning of a triplet, <subj>, which represents environmental or social issues. In [20], the authors presented
the end of the subject and the start of the predicate, and an approach for knowledge graph generation based on ESG-
<obj>, which represents the end of the predicate and start related news and company official documents.
of the object. The authors of the paper for REBEL provide a
parsing function for extracting the triplet from the output of III. M ETHODS
REBEL.
The second approach we took was to use ChatGPT [12], This section describes the methods used in this research,
as a conversational agent and compare the performance in the including the data collection process and the entity-relation
task of entity-relation extraction and creation of a common extraction algorithms used to analyze the gathered data.
A. Data Collecting Process approach will not work for entities that are not present on
In order to conduct the experimental comparison of the DBpedia.
two approaches for entity-relation extraction, news data was 2) ChatGPT: The second approach taken in this paper uses
gathered from the Web on the topic of sustainability. For this OpenAI’s ChatGPT [12]. We have created two experiments
purpose, the News API [21] system was used. News API is using ChatGPT.
an HTTP REST API for searching and retrieving live articles The first experiment prompts ChatGPT to extract relations
from all over the Web. It provides the ability to search through from the collected news articles. After extracting the relations,
the articles posted on the Web by specifying the following we follow the same steps as with the REBEL model in order
options: keyword or phrase, date of publication, source domain to create a comprehensive Knowledge Base.
name, and language. The second experiment focuses on creating a prompt that
Using News API, 94 news articles from 2023-02-15 to would directly generate the entire Knowledge Base and write
2023-03-19 on the topic of sustainability have been collected. an ontology describing the concepts identified in the texts.
The collected texts contained various numbers of words rang- This approach has the goal of reducing the number of manual
ing from 50 to over 4200. With the limitation of the number steps which need to be performed in order to obtain the final
of tokens that can be passed as input to a language model, Knowledge Graph.
additional pre-processing steps needed to be taken to account For both experiments, we set the value of the parameter
for the texts consisting of a large number of words. ’temperature’ to 0 in order to get more deterministic outputs
since OpenAI models are non-deterministic by nature.
B. Relation-Extraction Methods Experiment 1. For the first experiment, we prompt Chat-
Relation-extraction is a fundamental task in NLP that aims GPT to extract relations connected to sustainability. ChatGPT
to identify the semantic relationships between entities in a was able to successfully extract entities and connect them with
sentence or document. The task is challenging because it relations, and return the results in a triple format. After the
requires understanding the context in which the entities appear relations had been extracted, the same post-processing step of
and the types of relationships that exist between them. Entity Linking was implemented on the results from ChatGPT.
In this subsection, we describe how we utilize REBEL and Although ChatGPT was able to extract entities from the
ChatGPT for the task of relation extraction. articles and link them with relations, it was not successful at
1) REBEL: Our first approach was to use REBEL in an abstracting concepts. The entities and relations identified often
attempt to extract relations from unstructured news articles. represented whole phrases instead of concepts.
In order for REBEL to be able to use the provided texts, To overcome the obstacle, we prompted ChatGPT to map
they need to be tokenized with the corresponding tokenizer identified entities and relations to a suitable OWL ontology
function. Tokenization is the process of separating the raw text [23]. However, ChatGPT failed to identify relevant sustainabil-
into smaller units called tokens. Tokens can refer to words, ity concepts or define their instances. The identified classes,
characters, or sub-words. The model has a token limitation of such as Company, Customer, MarketingEcosystem, Resource,
512 tokens, which means that the collected articles which are CustomerExperience, Convenience, and DigitalMarketing, had
longer need to be pre-processed before sending them to the some potential relevance to sustainability, but ChatGPT did not
model for triplets extraction. identify any instances for these classes.
To address this limitation, we tokenize the raw text and Experiment 2. In the second experiment, we refined the
divide the tokens into 256-token batches. These batches are prompt to ask ChatGPT to explicitly generate an OWL ontol-
processed separately by the REBEL model, and the results ogy on sustainability, which includes concepts like organiza-
are subsequently merged to extract relations for longer texts. tions, actions, practices, policies, and related terms. We also
Metadata is also added to the extracted relations, referencing allowed ChatGPT to create additional classes and properties
the token batch from which the relation was derived. With if necessary. We explicitly requested the results to be returned
this approach, some relations may not be extracted accurately in RDF Turtle format.
because the batch of tokens might begin or end in the middle of Providing additional information to ChatGPT resulted in the
the sentence. However, the number of cases where this happens creation of an improved Knowledge Base. ChatGPT was able
is insignificant. Thus, we leave their handling for future work. to define concepts such as organizations, actions, practices,
Once the entity-relation extraction process is finished, the and policies, as well as identify suitable relations to connect
extracted information is stored in a triplet structure. To further them together. Moreover, it was able to create instances of the
normalize the extracted entities, we perform Entity Linking defined classes and properties and link them together. This
[22]. Entity Linking refers to the identification and association shows that adding more specific instructions to the prompts
of entity mentions in raw text with their corresponding entities for ChatGPT can produce drastically different results.
in a Knowledge Base. The process of Entity Linking is not part
of the REBEL model, and it is an additional post-processing IV. R ESULTS
step that is used to refine the extracted relations. In this study, This section presents the results from the experiments de-
we utilize DBpedia as our Knowledge Base and consider two scribed in Section III. A comparison of the created Knowledge
entities identical if they share the same DBpedia URL. This Base from both methods is given, and the characteristics of the
generated Knowledge Bases are outlined. Table I represents
the Knowledge Bases from the REBEL model and the first
experiment with ChatGPT, respectively. The table shows the
number of entities, relations, and triplets extracted from the
raw texts on sustainability.

TABLE I
K NOWLEDGE BASE STRUCTURE COMPARISON

Algorithm Entities Relations Triples


REBEL 805 105 854
ChatGPT 1158 677 826

As it is evident from the table, the number of triplets


extracted by both algorithms is similar. However, the number
of entities that ChatGPT extracts are larger than those from
REBEL. Although this is true, a lot of the extracted entities
are not connected to each other via any relation, thus defeating
the purpose of creating a Knowledge Base. Moreover, the
number of unique relations is far too large for ChatGPT to Fig. 1. Subset of the Knowledge Base generated using the REBEL model. The
be able to produce an ontology that can be used for further Knowledge Base is displayed in a graph format where entities are represented
experimentation. as nodes and relations are represented as edges.
The most frequent relation for the REBEL model is the
’subclass of’ relation, being part of 120 triplets. For ChatGPT,
it’s the ’has’ relation, being identified in 29 triplets. In addi-
tion, ChatGPT often fails to generate standard relations and
entities which represent abstract concepts and instead outputs
an entire phrase, such as in the example ’has already surpassed
a goal set in 2019 to install 100,000 heat pumps in homes and
businesses’, where it identifies this phrase as a relation.
The following subsections represent a visual display of a
subset of the generated Knowledge Bases from both algo-
rithms.
A. REBEL
In order to be able to analyze the Knowledge Base generated
using the REBEL model more accurately, we have created a
visualization in a graph format, where each entity represents
a node in the graph, and each relation represents an edge. Fig.
IV-A displays a subset of the extracted Knowledge Base.
It is visible from the figure that the model successfully iden-
tifies entities related to sustainability, such as ’sustainability’, Fig. 2. Subset of the Knowledge Base generated using the first experiment
’recycling’, ’clean technology’, ’business model’, ’repurpos- with ChatGPT. The Knowledge Base is displayed in a graph format where
ing’, and even links corporations such as ’Samsung’ to these entities are represented as nodes and relations are represented as edges.
entities. We can notice that multiple entities are interlinked in
a meaningful way.
Although these phrases are related to sustainability, they
B. ChatGPT do not represent specific entities. This happens as a result
The same visualization for the Knowledge Base generated of the fact that ChatGPT is a conversational model trained
by the first experiment with ChatGPT is represented in this on a task to generate responses to a provided prompt and
subsection. Fig. IV-B displays a subset of the extracted Knowl- not specifically trained to be able to recognize entities and
edge Base. relations. On the other hand, ChatGPT is able to identify some
We can see from the figure that ChatGPT is able to identify concepts that REBEL does not, and additionally, it is able to
entities related to sustainability, but they are represented as link corporations to specific sustainability-related phrases.
phrases instead of concepts. For example, ChatGPT extracts Prompt engineering [24] is of great importance when it
’small high-value items in jumbo packaging’, ’steps and waste comes to the results generated from ChatGPT [12]. Since it
from its supply chain’, and ’suppliers to use recycled and is a generative model, small variations in the input sequence
recyclable materials’, as entities. can create large differences in the produced output.
Observing the full Knowledge Base generated using Chat-
GPT, most of the time, the extracted entities represent phrases
or whole sentences, which is not beneficial for creating a
Knowledge Base because it’s hard to normalize the entities
and relations and create a more general ontology consisting
of the concepts represented in the graph.
For this reason, we conducted the second experiment with
ChatGPT, where we defined a more detailed prompt and
instructed ChatGPT to generate an ontology based on each
Fig. 4. Knowledge Base generated with ChatGPT for the second article. The
article it sees and additionally define instances of the generated identified concepts are represented as yellow rectangles, and the instances are
ontology based on the information present in each article. represented with green rectangles.
Figure IV-B presents the results of the refined prompt, with
the ontology and instances generated from a single article out
of the 94 collected articles. ResourceSharing>. This also allows for answering complex
queries in the sustainability domain.
While the consistency of the generated ontologies may be
limited, our analysis reveals that there are significant simi-
larities between them. Therefore, future research can explore
methods for unifying these ontologies across all articles, which
has the potential to enhance the overall definition of concepts
and their interrelationships in the sustainability domain.
It is important to mention that due to the limitations of
the length of the input prompt passed to ChatGPT, it was not
possible to prompt the model first to define an ontology based
on all articles on sustainability and then create instances from
all the other articles using the same ontology.
C. Quality Evaluation
Since the evaluation of a Knowledge Base cannot be created
in an automated way based on some metric, when ground truth
Fig. 3. Knowledge Base generated with ChatGPT for the first article. The data is not available, we need to utilize qualitative principles in
identified concepts are represented as yellow rectangles, and the instances are order to evaluate the results. Based on the practical framework
represented with green rectangles. defined in the study [25], the following 18 principles identified:
1) Triples should be concise
Not only does ChatGPT create an ontology using the 2) Contextual information of entities should be captured
concepts it was instructed to use, but it also defines classes on 3) Knowledge graph does not contain redundant triples
its own and is able to create instances of most of the classes 4) Knowledge graph can be updated dynamically
accurately. 5) Entities should be densely connected
As an example, it identifies the entity ”Soluna” as an 6) Relations among different types of entities should be
”instanceOf” the class ”Organizations”. Furthermore, it is included
able to identify the triplet <Soluna, utilizes, Excess Energy>, 7) Data source should be multi-field
and <Excess Energy, instanceOf, Practices>. 8) Data for constructing a knowledge graph should in
These types of triplets already start representing an initial different types and from different resources
knowledge base, which can answer queries on companies 9) Synonyms should be mapped, and ambiguities should
that implement practices that use excess energy. Although be eliminated to ensure reconcilable expressions
the hierarchy of concepts can be better defined so that more 10) Knowledge graph should be organized in structured
complex queries can be answered, this method represents a triples for easily processed by machine
solid start in building a shared Knowledge Base, using only 11) The scalability with respect to the KG size
unstructured texts. 12) The attributes of the entities should not be missed
Using another article, the ontology and instances given 13) Knowledge graph should be publicly available and pro-
in [Link]-B have been generated. Looking at this second prietary
example, we can see that ChatGPT links practices, actions, 14) Knowledge graph should be an authority
and policies to the organizations, which was not the case in 15) Knowledge graph should be concentrated
the previous example. 16) The triples should not contradict each other
Additionally, it identifies the triplets <Starbucks, in- 17) For domain-specific tasks, the knowledge graph should
stanceOf, Organization>, and <Starbucks, hasPractice, be related to that field
18) Knowledge graph should contain the latest resources to [6] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal,
guarantee freshness A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language mod-
els are few-shot learners,” Advances in neural information processing
According to these principles, in our use case, we manually systems, vol. 33, pp. 1877–1901, 2020.
inspected the Knowledge Graphs generated with the proposed [7] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena,
Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer
methods, and we can conclude that the second ChatGPT ap- learning with a unified text-to-text transformer,” The Journal of Machine
proach creates a Knowledge Graph of greater quality compared Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
to the other two Knowledge Bases. [8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
However, it should be noted that to create these Knowledge preprint arXiv:1810.04805, 2018.
Bases, a few steps of refining the answers from ChatGPT [9] U. Nations, “World’s poorest nations left behind in reaching sustainable
are needed. Sometimes the produced output is erroneous and development goals, delegates stress as second committee begins general
debate,” 2018. [Link]
needs to be corrected before proceeding. Thus, this calls for [10] R. J. Baumgartner and R. Rauter, “Strategic perspectives of corporate
methods for automatically identifying incorrect OWL syntax sustainability management to develop a sustainable organization,” Jour-
and requesting to fix the previous output. nal of Cleaner Production, vol. 140, pp. 81–92, 2017.
[11] P.-L. H. Cabot and R. Navigli, “Rebel: Relation extraction by end-to-end
language generation,” in Findings of the Association for Computational
V. C ONCLUSION Linguistics: EMNLP 2021, pp. 2370–2381, 2021.
[12] OpenAI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774,
In this paper, we presented a Natural Language Processing- 2023.
based method for constructing a Knowledge Graph on the [13] D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, “Relation classification
topic of sustainability using raw documents available on via convolutional deep neural network,” in Proceedings of COLING
2014, the 25th international conference on computational linguistics:
the Web. The study demonstrated that meaningful infor- technical papers, pp. 2335–2344, 2014.
mation could be extracted from unstructured data through [14] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy,
an automated process, which can subsequently be utilized V. Stoyanov, and L. Zettlemoyer, “BART: Denoising sequence-to-
sequence pre-training for natural language generation, translation, and
for decision-making and process modeling. The focus on comprehension,” in Proceedings of the 58th Annual Meeting of the
sustainability served as a concrete use case, illustrating the Association for Computational Linguistics, (Online), pp. 7871–7880,
effectiveness and potential of the presented approach. Association for Computational Linguistics, July 2020.
[15] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin,
Although the experiments were conducted on the use case C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language
of sustainability, the primary emphasis is on the methodology models to follow instructions with human feedback,” Advances in Neural
itself, which lays the foundation for empirical analysis of Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
[16] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Prox-
qualitative data derived from various sources. The construction imal policy optimization algorithms,” arXiv preprint arXiv:1707.06347,
of a Knowledge Base using the presented approach can serve 2017.
as a first step for analyzing diverse aspects of any subject [17] Simon-Kucher, “2022 global sustainability study: The growth
potential of environmental change.” [Link]
matter and answering complex queries based on the gathered [Link]/en/insights/2022-global-sustainability-study-growth-
information. potential-environmental-change.
In future research, first, we plan to adopt a more formal [18] J. Lee and M. Kim, “Esg information extraction with cross-sectoral
and multi-source adaptation based on domain-tuned language models,”
framework for assessing the quality of generated knowledge Expert Systems with Applications, p. 119726, 2023.
graphs. Such a framework will enable us to effectively evaluate [19] A. Ehrhardt and M. T. Nguyen, “Automated esg report analysis by joint
the quality of KGs and provide a standardized means of entity and relation extraction,” in Machine Learning and Principles
and Practice of Knowledge Discovery in Databases: International
assessing their overall quality. We also want to extend the Workshops of ECML PKDD 2021, Virtual Event, September 13-17, 2021,
presented methodology to other domains, unifying generated Proceedings, Part II, pp. 325–340, Springer, 2022.
knowledge bases and employing graph-based modeling to [20] I. Vodenska, R. Trajanov, L. Chitkushev, and D. Trajanov, “Chal-
lenges and opportunities in esg investments,” in Computer Science and
predict missing links between concepts and relationships for Education in Computer Science: 18th EAI International Conference,
a given domain. CSECS 2022, On-Site and Virtual Event, June 24-27, 2022, Proceedings,
pp. 168–179, Springer, 2022.
R EFERENCES [21] [Link], “Newsapi.” ”[Link]
[22] W. Shen, J. Wang, and J. Han, “Entity linking with a knowledge base:
[1] T. Brants, A. C. Popat, P. Xu, F. J. Och, and J. Dean, “Large language Issues, techniques, and solutions,” IEEE Transactions on Knowledge and
models in machine translation,” in Proceedings of the 2007 Joint Data Engineering, vol. 27, no. 2, pp. 443–460, 2014.
Conference on Empirical Methods in Natural Language Processing and [23] D. L. McGuinness, F. Van Harmelen, et al., “Owl web ontology language
Computational Natural Language Learning, pp. 858–867, Association overview,” W3C recommendation, vol. 10, no. 10, p. 2004, 2004.
for Computational Linguistics, June 2007. [24] E. Saravia, “Prompt Engineering Guide,” [Link]
[2] X. Chen, S. Jia, and Y. Xiang, “A review: Knowledge reasoning ai/Prompt-Engineering-Guide, 12 2022.
over knowledge graph,” Expert Systems with Applications, vol. 141, [25] H. Chen, G. Cao, J. Chen, and J. Ding, “A practical framework for
p. 112948, 2020. evaluating the quality of knowledge graph,” in Knowledge Graph and
[3] A. Mikheev, M. Moens, and C. Grover, “Named entity recognition Semantic Computing: Knowledge Computing and Language Under-
without gazetteers,” in Ninth Conference of the European Chapter of standing: 4th China Conference, CCKS 2019, Hangzhou, China, August
the Association for Computational Linguistics, pp. 1–8, 1999. 24–27, 2019, Revised Selected Papers 4, pp. 111–122, Springer, 2019.
[4] G. Zhou, J. Su, J. Zhang, and M. Zhang, “Exploring various knowledge
in relation extraction,” in Proceedings of the 43rd annual meeting of the
association for computational linguistics (acl’05), pp. 427–434, 2005.
[5] A. Kamath and R. Das, “A survey on semantic parsing,” arXiv preprint
arXiv:1812.00978, 2018.

You might also like