0% found this document useful (0 votes)
23 views37 pages

Systems 11 00351 v2

This document discusses the integration of ChatGPT into the systematic review (SR) process to enhance automation and efficiency, particularly focusing on IoT applications in water and wastewater management. The methodology includes four modules for literature search, screening, data extraction, and content analysis, demonstrating ChatGPT's effectiveness in improving accuracy and saving time compared to traditional methods. While the study highlights the potential of ChatGPT, it also acknowledges limitations in its application for article extraction and emphasizes the need for further research to optimize its use in systematic reviews.

Uploaded by

Himura Yui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views37 pages

Systems 11 00351 v2

This document discusses the integration of ChatGPT into the systematic review (SR) process to enhance automation and efficiency, particularly focusing on IoT applications in water and wastewater management. The methodology includes four modules for literature search, screening, data extraction, and content analysis, demonstrating ChatGPT's effectiveness in improving accuracy and saving time compared to traditional methods. While the study highlights the potential of ChatGPT, it also acknowledges limitations in its application for article extraction and emphasizes the need for further research to optimize its use in systematic reviews.

Uploaded by

Himura Yui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

systems

Article
Harnessing the Power of ChatGPT for Automating Systematic
Review Process: Methodology, Case Study, Limitations, and
Future Directions
Ahmad Alshami 1 , Moustafa Elsayed 2 , Eslam Ali 3,4, *, Abdelrahman E. E. Eltoukhy 5, * and Tarek Zayed 3

1 Department of Civil and Environmental Engineering, FAMU-FSU College of Engineering,


Florida State University, Tallahassee, FL 32013, USA; [email protected]
2 Department of Civil and Environmental Engineering, FAMU-FSU College of Engineering,
Florida A&M University, Tallahassee, FL 32013, USA; [email protected]
3 Department of Building and Real Estate, Faculty of Construction and Environment,
The Hong Kong Polytechnic University, Kowloon TU428, Hong Kong; [email protected]
4 Public Works Department, Geomatics Lab, Faculty of Engineering, Cairo University, Giza 12613, Egypt
5 Department of Industrial and System Engineering, The Hong Kong Polytechnic University,
Hung Hom TU428, Hong Kong
* Correspondence: [email protected] (E.A.); [email protected] (A.E.E.E.)

Abstract: Systematic reviews (SR) are crucial in synthesizing and analyzing existing scientific lit-
erature to inform evidence-based decision-making. However, traditional SR methods often have
limitations, including a lack of automation and decision support, resulting in time-consuming and
error-prone reviews. To address these limitations and drive the field forward, we harness the power
of the revolutionary language model, ChatGPT, which has demonstrated remarkable capabilities
in various scientific writing tasks. By utilizing ChatGPT’s natural language processing abilities,
our objective is to automate and streamline the steps involved in traditional SR, explicitly focusing
on literature search, screening, data extraction, and content analysis. Therefore, our methodology
comprises four modules: (1) Preparation of Boolean research terms and article collection, (2) Abstract
screening and articles categorization, (3) Full-text filtering and information extraction, and (4) Content
analysis to identify trends, challenges, gaps, and proposed solutions. Throughout each step, our
Citation: Alshami, A.; Elsayed, M.; focus has been on providing quantitative analyses to strengthen the robustness of the review process.
Ali, E.; Eltoukhy, A.E.E.; Zayed, T.
To illustrate the practical application of our method, we have chosen the topic of IoT applications in
Harnessing the Power of ChatGPT
water and wastewater management and quality monitoring due to its critical importance and the
for Automating Systematic Review
dearth of comprehensive reviews in this field. The findings demonstrate the potential of ChatGPT in
Process: Methodology, Case Study,
bridging the gap between traditional SR methods and AI language models, resulting in enhanced
Limitations, and Future Directions.
Systems 2023, 11, 351. https://
efficiency and reliability of SR processes. Notably, ChatGPT exhibits exceptional performance in
doi.org/10.3390/systems11070351 filtering and categorizing relevant articles, leading to significant time and effort savings. Our quanti-
tative assessment reveals the following: (1) the overall accuracy of ChatGPT for article discarding
Academic Editor: William T. Scherer
and classification is 88%, and (2) the F-1 scores of ChatGPT for article discarding and classification
Received: 8 June 2023 are 91% and 88%, respectively, compared to expert assessments. However, we identify limitations in
Revised: 4 July 2023 its suitability for article extraction. Overall, this research contributes valuable insights to the field of
Accepted: 7 July 2023 SR, empowering researchers to conduct more comprehensive and reliable reviews while advancing
Published: 9 July 2023 knowledge and decision-making across various domains.

Keywords: ChatGPT; systematic review; automation; Internet of Things (IoT); article filtration; article
categorization; information extraction; content analysis
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
1. Introduction
Attribution (CC BY) license (https:// Review articles serve various purposes within the academic literature with differ-
creativecommons.org/licenses/by/ ent types, including narrative reviews, Systematic reviews (SR), meta-analyses, scoping
4.0/). reviews, and integrative reviews [1]. Narrative reviews provide a broad overview and

Systems 2023, 11, 351. https://doi.org/10.3390/systems11070351 https://www.mdpi.com/journal/systems


Systems 2023, 11, 351 2 of 37

subjective analysis of existing literature [2], while SRs employ a thorough methodology
to synthesize all relevant studies on a specific research question, ensuring objectivity and
minimizing bias [3]. SRs offer several advantages, such as providing a reliable and com-
prehensive assessment of evidence, guiding evidence-based practice and policymaking,
identifying research gaps, and enhancing statistical power through meta-analysis [4,5]. It
is worth mentioning that SR articles are a valuable tool for synthesizing and analyzing
research evidence in many fields of research, particularly in fields where research evidence
is constantly evolving, such as in healthcare [6–9], project management [10–13], construc-
tion management [14–19], and aviation routing management [20,21]. To ensure that SRs
are reported accurately and comprehensively, PRISMA (Preferred Reporting Items for SRs
and Meta-Analyses) is widely used. Developing and executing a comprehensive search
strategy to conduct an SR using the PRISMA method is essential.
The search strategy is vital in identifying relevant studies to be included in SRs. Such
a strategy involves carefully selecting appropriate databases, applying pertinent Boolean
research terms (BST) and keywords, and executing systematic searches to capture a com-
prehensive variety of evidence related to the research question [22]. In accordance to the
PRISMA guidelines [23], inclusion and exclusion criteria are also crucial in the SR process.
These predetermined criteria help assess the relevance of articles during the study selection
phase, ensuring that the chosen studies align with the review’s objectives and provide
pertinent information to address the research question. Furthermore, snowballing is mainly
applied to identify additional relevant articles that may have been missed in the initial liter-
ature search. The snowballing process can be achieved by gathering the articles from the
references (backward) and citation (forward) lists of included studies [24]. However, it is
important to acknowledge the PRISMA method’s limitations, including potential reporting
bias, the challenges of adapting to different review articles, human uncertainties in deter-
mining the article’s eligibility, and the time consumed including and excluding articles from
the database [25,26]. Despite these limitations, the SR process, PRISMA guidelines, and
snowballing procedures significantly all contribute to evidence synthesis and knowledge
advancement across various fields. With the continued advancement of AI-driven language
and chatbot technologies, there is an increasing potential for automating the SR process
through alternative methods. Leveraging these AI-powered tools offers opportunities to
streamline the SR process, saving time and costs while addressing uncertainties arising
from human responses. By exploring these possibilities, we can optimize workflows and
enhance the overall efficiency of conducting SR.
ChatGPT (Generative Pre-trained Transformer) has proven to be a valuable tool
in various fields, including healthcare [27–29], education [30–33], construction manage-
ment [34,35], and scientific writing [36–38]. Within scientific writing, ChatGPT has proven
its efficacy in generating abstracts, introductions, and research article summaries, while
also assisting with SR processes by extracting relevant information and providing concise
summaries [39,40]. Its capabilities as a powerful language model extends beyond simple
language generation, offering valuable suggestions for structuring the article, enhancing
clarity, and ensuring a logical flow [41]. Collaborating with ChatGPT empowers researchers
to outline different manuscript sections, including the introduction, methods, results, and
discussion, facilitating comprehensive and cohesive narratives [42]. Furthermore, Chat-
GPT’s role extends to the editing and proofreading stages of scientific writing, serving as
a meticulous grammar and language checker to adhere to the required style and format-
ting guidelines [43]. However, it is essential to recognize that while ChatGPT provides
indispensable support, its usage should complement human expertise. Researchers must
critically evaluate the model’s outputs, thoroughly verify information, and ensure the accu-
racy and reliability of the generated content [44]. By combining the capabilities of ChatGPT
with human insight, researchers can significantly enhance the efficiency, productivity, and
overall quality of their research and scientific writing endeavors.
Despite the capabilities of ChatGPT in various aspects of scientific writing, there is
no previous research focusing on automating the SR process by levering the power of
Systems 2023, 11, 351 3 of 37

ChatGPT. However, a recent study by Qureshi [45] has raised important questions about
the possibilities of ChatGPT in automating the SR process. It is worth mentioning that
this study [45] just raised the question and discussed ChatGPT’s capabilities in the SR
process; however, they did not introduce a practical implementation of how we can do this
by levering the ChatGPT. While acknowledging the outstanding capabilities of ChatGPT in
automating the SR process, the study [45] recommended further research to investigate its
limitations and capacities. Therefore, our paper aims to bridge this gap by harnessing the
power of ChatGPT to introduce a practical implementation of the automated SR process.
Our main focus is on streamlining the traditional process of SR and introducing practical
implementations of ChatGPT at different stages of the SR process.
In order to showcase the practical implementation of our methodology, we delve
into the extensive domain of Internet of Things (IoT) applications pertaining to water
and wastewater management, as well as water quality monitoring. This subject holds
significant importance due to the transformative impact of IoT in these particular domains.
By undertaking this exploration, we contribute to the automation of the systematic review
(SR) process, which can be applicable to various research fields, and provide valuable
insights into the current state of IoT technologies in these critical areas.
Our approach encompasses a series of well-designed steps, commencing with a com-
prehensive and systematic search across relevant databases. Subsequently, we employ
stringent filtering and extraction techniques to extract the most pertinent information from
the collected literature. This is followed by a thorough content analysis of the selected
studies, enabling us to unveil patterns, identify emerging trends, and gain a holistic under-
standing of the overall landscape regarding IoT applications in water management and
water quality monitoring. By harnessing the capabilities of ChatGPT technology, we can
leverage its natural language processing capabilities to streamline the analysis process and
unveil concealed connections within the research corpus.
It is important to emphasize that while this paper outlines the methodology for
conducting an SR, it does not delve into the specific findings regarding IoT applications in
water management and water quality monitoring. Instead, the findings will be meticulously
documented and published separately, allowing for a comprehensive exploration of this
dynamic and critical area. The detailed objectives of the study can be summarized in the
following points:
X To investigate the potential of ChatGPT in generating relevant keywords and phrases
for literature search in water and wastewater management applications and water
quality monitoring.
X To compare the accuracy and efficiency of utilizing ChatGPT for screening and filtering
studies to be included in an SR, in contrast to conventional methods.
X To assess the completeness and accuracy of employing ChatGPT in extracting and
synthesizing information from abstracts and full-text articles of the selected studies.
X To compare the quality and rigor of the SR process when utilizing ChatGPT against
traditional SR methods. This comparison will consider various metrics, including
reproducibility, bias, and transparency.
X To provide comprehensive guidance on the best practices for integrating ChatGPT into
the methodology of SRs specifically focused on water and wastewater management.
To fulfil the objectives of this study, a novel methodology is devised to integrate
ChatGPT into the SR procedure, and its performance is compared against traditional SR
approaches. This paper makes a valuable contribution to the existing body of knowledge
on utilizing artificial intelligence (AI) in advancing SR methodologies by presenting an
innovative approach that leverages ChatGPT (based on the GPT-3.5 architecture model)
to enhance the overall process. The proposed methodology is employed to conduct an SR
article focusing on IoT applications in water and wastewater management. Furthermore,
the implications and limitations of this methodology for future research endeavors in the
field are thoroughly examined and discussed.
Systems 2023, 11, 351 4 of 37

2. Research Methodology
2.1. Exploring ChatGPT: Characteristics and Interactions
ChatGPT is a powerful language model that is specifically designed to facilitate in-
teractive conversations and simulate human-like dialogue. It is built upon the foundation
of GPT-3.5, an advanced variant of the GPT-3 model developed by OpenAI. ChatGPT
leverages the enhancements and refinements introduced in GPT-3.5, which include im-
proved natural language understanding, longer consecutive output, and better adherence
to instructions. By utilizing ChatGPT, our study benefits from its ability to retain context
from previous interactions, allowing for more coherent and context-aware responses. This
feature enables ChatGPT to generate high-quality and engaging conversational experiences,
making it an ideal choice for chat-based applications and conversational agents. Further-
more, ChatGPT based on GPT-3.5 offers advanced natural language processing capabilities,
enabling it to perform tasks such as summarization, question answering, and handling
large datasets with enhanced accuracy and relevance. Generally, GPT is a general-purpose
language model developed by OpenAI, while ChatGPT is a variant of GPT specifically
designed for conversational interactions.
In the proposed methodology, we adopted an interactive approach by engaging in
conversations with ChatGPT. To ensure effective interaction, we carefully prepared prompts
that prompted ChatGPT to generate responses in a conversational manner. Notably, we
made a deliberate decision to retain the conversation history throughout the interaction.
By intentionally preserving the dialogue context and not clearing the conversation history
before generating new responses, we observed a significant improvement in the learn-
ing and performance of ChatGPT. Retaining the conversation history allows ChatGPT
to maintain a contextual understanding of the ongoing conversation, resulting in more
coherent and relevant responses. This approach enables ChatGPT to effectively build upon
the previous exchanges, consider the entirety of the conversation’s context, and provide
responses that are not only accurate but also contextually appropriate. By leveraging the
full conversational context, our methodology harnesses the true potential of ChatGPT
based on GPT-3.5 and enhances the overall quality of the interactive experience.

2.2. Automation of SR Process Using ChatGPT


This study utilized a mixed-methods research design, combining ChatGPT, an AI-
driven language model, with traditional SR methods to automate and streamline the review
process while enhancing its efficiency and reliability. By bridging the gap between tradi-
tional SR methods and AI language models, this approach facilitated a comprehensive
exploration of the research topic through qualitative and quantitative analyses. Qualitative
analysis identified trends, challenges, gaps, and recommendations within selected studies,
while quantitative analysis evaluated ChatGPT’s performance compared to expert assess-
ments. This methodology involved iterative stages depicted in Figure 1, where ChatGPT
automated specific tasks while ensuring result accuracy and reliability through human
oversight. These stages encompassed extracting research questions, generating Boolean
research terms (BSTs), filtering publications based on abstracts, conducting full-text fil-
tration and information extraction, and performing comprehensive content analysis. The
following subsections provide a comprehensive and detailed description of the proposed
methodology, encompassing each stage of the automation process.
Systems 2023, 11,
Systems 2023, 11, 351
351 55of
of 38
37

Figure 1.
Figure 1. Overview
Overview of
of the
the SR
SR Process Automation Stages.
Process Automation Stages.

2.2.1. Initialization, Extraction


2.2.1. Initialization, Extraction ofof Research
Research Words
Words and
and Articles
Articles Records
Records
The
The methodology
methodologyfor forautomating
automatingSRSR process
processsteps involves
steps the following
involves procedures.
the following proce-
Firstly, a suitable
dures. Firstly, database
a suitable is chosen
database as the primary
is chosen source of
as the primary information.
source A crucial
of information. step in
A crucial
commencing the SR article
step in commencing the SR involves
article identifying and including
involves identifying and pertinent
includingarticles addressing
pertinent articles
the research questions within the SR. To facilitate this process, it becomes
addressing the research questions within the SR. To facilitate this process, it becomes im- imperative to
generate BSTs capable of effectively searching through diverse databases,
perative to generate BSTs capable of effectively searching through diverse databases, suchsuch as Scopus,
Google
as Scopus,Scholar,
Google or Scholar,
Web of Science.
or Web of ToScience.
enhanceTothe qualitythe
enhance of quality
responses from ChatGPT,
of responses from
which
ChatGPT, which utilizes reinforcement learning [45], we implemented a strategy input
utilizes reinforcement learning [45], we implemented a strategy of gradual of
of grad-
questions. General questions about the research topic are initially posed, followed
ual input of questions. General questions about the research topic are initially posed, fol- by more
specific
lowed by inquiries to stimulate
more specific ChatGPT’s
inquiries understanding
to stimulate ChatGPT’sand generate accurate
understanding responses.
and generate ac-
This
curate responses. This approach facilitates a progressive refinement of ChatGPT’senables
approach facilitates a progressive refinement of ChatGPT’s understanding and under-
the generation of accurate responses. Following the initialization process, ChatGPT is
standing and enables the generation of accurate responses. Following the initialization
informed about the objective of conducting a SR within a specific research area. ChatGPT
process, ChatGPT is informed about the objective of conducting a SR within a specific
Systems 2023, 11, 351 6 of 37

leverages this information to generate search terms or BSTs tailored to the selected database.
These BSTs are designed to refine the search and include relevant keywords associated with
the research topic. It is important to note that while ChatGPT streamlines the search process,
manual searching remains necessary to account for potential formatting inconsistencies
or limitations, ensuring the accurate retrieval of relevant articles. This manual search
complements the automated search process and serves to validate the results obtained
from ChatGPT.
To evaluate ChatGPT’s proficiency in keyword extraction, it is assigned the task of
identifying frequently used keywords based on the BSTs employed for publication extrac-
tion. The extracted keywords are then compared with keywords obtained from established
software tools (e.g., VOSviewer software) for validation and analysis. This comparative
analysis facilitates the assessment of the degree of overlap and potential differences in the
extracted keywords, ensuring the reliability of the keyword extraction process.

2.2.2. Articles Filtration Using Titles and Abstracts


Traditionally, the initial filtration of articles in the SR process involves manual investi-
gation of abstracts, which is considered time-consuming and prone to human errors. To
overcome these challenges, an alternative approach is being employed using ChatGPT to
perform the filtration process. Initially, broad categories of interest are identified based
on an analysis of research trends in the field. These categories are selected to encompass
the key focus areas and ensure that the filtration process targets the most relevant articles
within those domains. To better elaborate on the capabilities of ChatGPT, the problem
is restructured as a classification task, where ChatGPT is assigned the responsibility of
categorizing articles into specific predefined categories. In cases where an article does not
fit into any of these categories, ChatGPT should classify it as irrelevant or under the “not
related” category. To assess the classification abilities of ChatGPT across various input
scenarios, two task scenarios are conducted. In the first scenario (i.e., ChatGPT (APA)),
ChatGPT is provided with only the article APA reference as input, while in the second
scenario (i.e., ChatGPT (APA + Abstract)), both the article APA reference and abstract are
included as input. By employing these two scenarios, we are able to examine how the
inclusion of Supplementary Information affected the accuracy of the classification results,
enabling a comprehensive evaluation of ChatGPT’s performance with different input levels.
By comparing the results of these two scenarios, the impact of including Supplementary
Information on the classification accuracy can be assessed, allowing for determining the
most suitable methodology for automating the initial articles filtration process.
As the classification of articles utilizing ChatGPT represents a novel approach, it is
of utmost importance to establish a robust evaluation methodology that can accurately
assess its performance. Recognizing the significance of evaluation, we embarked on a
comprehensive evaluation process incorporating expert volunteers’ invaluable opinions
and expertise to provide a comprehensive and reliable assessment. These volunteers,
consisting of researchers and engineers with varying levels of expertise in water and
wastewater management, provided a benchmark against which ChatGPT’s classification
outcomes were compared. The evaluation process incorporates human interpretation and
contextual understanding, enriching the assessment with valuable feedback and insights.
Expert volunteers are given a questionnaire containing article titles and abstracts to evaluate
and classify. Transparency is a key aspect of our evaluation approach. To better evaluate
the agreement between raters and to decrease human biases, we evaluate the inter-rater
reliability of the volunteer responses using Cohen’s kappa [46]. Based on this analysis, we
can estimate the consistency of classifications among volunteers and identify any unreliable
raters. Raters with a low kappa value or a lack of agreement with other raters will be
excluded from further analysis to ensure the process’ accuracy and reliability.
Furthermore, a confusion matrix will be constructed to assess the relationship between
expert classification (i.e., benchmark) and ChatGPT’s classifications based on the two
different scenarios. The confusion matrix is a widely used tool to evaluate the identification
Systems 2023, 11, 351 7 of 3

Systems 2023, 11, 351 7 of 37

identification accuracy between actual and predicted values in classification tasks. It pro
vides valuable
accuracy between insights
actual andinto the precision
predicted values inand accuracytasks.
classification of the classification
It provides model [47
valuable
insights into the precision
The confusion and accuracy
matrix consists of Trueof the classification
Positive model
(TP), True [47]. The
Negative confusion
(TN), False Positiv
matrix consists
(FP), and ofNegative
False True Positive
(FN)(TP), TrueThe
values. Negative
diagonal(TN), False Positive
values (FP), and
of the matrix False the co
represent
Negative (FN) values. The diagonal values of the matrix represent the correctly identified
rectly identified samples, while FP and FN represent incorrect predictions. As depicted i
samples, while FP and FN represent incorrect predictions. As depicted in Figure 2, the
Figure 2, the confusion matrix will allow us to calculate various performance metrics suc
confusion matrix will allow us to calculate various performance metrics such as precision,
as precision,
accuracy, accuracy,
and F1-score and
based on F1-score based
the TP, TN, onFN
FP, and thevalues.
TP, TN,OurFP, and FNwill
evaluation values. Our evalua
consider
tion
the willclassifications
expert consider the(i.e.,
expert classifications
benchmark) (i.e., benchmark)
as true values and ChatGPTasclassifications
true valuesas and
theChatGP
classifications
predicted values.as the predicted values.

Figure
Figure 2. 2. Illustration
Illustration of the
of the components
components of theof the confusion
confusion matrix matrix
and the and the equation
equation used to estima
used to estimate
the assessment metrics. Symbols in green cells represent the number of correctly classified samples, sample
the assessment metrics. Symbols in green cells represent the number of correctly classified
whilesymbols
while symbols in in magenta
magenta represent
represent the the number
number of misclassified
of misclassified samples.
samples. q, r, s,q,and
r, s,t represent
and t represent th
the total number of articles belonging to different categories, while u, v, w, and x representrepresent
total number of articles belonging to different categories, while u, v, w, and x the total the tot
number
number of of ChatGPT
ChatGPT classifications.
classifications.

2.2.3.
2.2.3.Full-Text
Full-TextFiltration andand
Filtration Information Extraction
Information Extraction
After the initial articles’ filtration using titles and abstracts, a second round of article
After the initial articles’ filtration using titles and abstracts, a second round of articl
filtration is traditionally conducted to evaluate the suitability of the remaining articles
filtration
for inclusion is in
traditionally
the review andconducted
to extractto evaluate
valuable the suitability
information of theHowever,
from them. remaining articles fo
this
inclusion
manual in the
reading review
process can and to extract valuable
be time-consuming. information
To address from an
this challenge, them. However, th
automated
manual reading process can be time-consuming. To address this
approach utilizing ChatGPT is employed for full-text filtering. The approach focuses challenge, anonautomate
approach sub-categories
identifying utilizing ChatGPT withiniseach
employed for full-text
main category, filtering.
enabling Theexploration
a targeted approach of focuses o
specific areas of
identifying interest, and ensuring
sub-categories withinaeachcomprehensive coverage
main category, of diverse
enabling topics relevant
a targeted exploration o
tospecific
the review.
areas Careful selection
of interest, andof these sub-categories
ensuring allows forcoverage
a comprehensive two primary objectives:
of diverse topics rele
extracting
vant to the review. Careful selection of these sub-categories allows for two do
relevant information for each sub-category and eliminating articles that not
primary objec
align with the research goals. To automate the information extraction process, a prompt is
tives: extracting relevant information for each sub-category and eliminating articles tha
designed to solicit ChatGPT’s recommendations for relevant questions related to each sub-
do not align
category. withresponses
ChatGPT’s the research goals.
will help To automate
extract informationthefrominformation extraction
the articles and eliminateprocess,
prompt is designed to solicit ChatGPT’s recommendations for relevant questions relate
to each sub-category. ChatGPT’s responses will help extract information from the article
and eliminate irrelevant studies. Accordingly, two task scenarios are conducted to evalu
ate ChatGPT’s efficacy in automating this process. The first scenario involves providin
Systems 2023, 11, 351 8 of 37

irrelevant studies. Accordingly, two task scenarios are conducted to evaluate ChatGPT’s
efficacy in automating this process. The first scenario involves providing ChatGPT with
only the article reference as an input (i.e., ChatGPT (APA)), while in the second scenario,
the input includes the article’s relevant sections, such as abstracts, methodologies, and
some parts of the results and discussions. The length of the prompts is adjusted to balance
obtaining reliable responses from ChatGPT and saving time.
It is important to highlight that in the second scenario, the relevant information in the
articles includes data presented in tabular and figure formats, which constitute a significant
amount of details influencing the quality of the extracted information. To address these
limitations, we took measures to incorporate tabular information into the input provided
to ChatGPT. This inclusion of structured data from tables aimed to enhance the model’s
understanding and improve the accuracy of its responses. However, it is essential to
acknowledge that models such as ChatGPT may not possess the specific capability to
interpret visual data when it comes to extracting information from figures. Therefore, we
recommend that researchers carefully analyze figures and rely on human interpretation
to extract relevant information, particularly when the figures contain substantial and
intricate content. By retaining control over full-text filtration and information extraction,
researchers can ensure the accurate interpretation and the inclusion of important details
from non-textual sources.
The evaluation process in this stage is subjective and cannot solely be relied on to
assess ChatGPT’s performance in extracting information. To overcome this limitation, a
collective approach is adopted. The authors collaboratively answer the questions posed to
a subset of articles, following the conventional systematic review process. The agreement
between the authors’ answers and ChatGPT’s responses indicates ChatGPT’s efficacy in
comprehending and extracting information from the articles.

2.2.4. Content Analysis of the Extracted Information


The content analysis of the extracted information is a critical phase in SR methodology,
which traditionally consumes a significant amount of time. This phase focuses on analyzing
the content collected in the previous stages to identify patterns, extract key insights, and
generate comprehensive data statistics. The primary objective is to facilitate a thorough
discussion and evaluation of the research, including identifying research gaps and limita-
tions in previous studies, ultimately leading to informed recommendations. To expedite
this time-consuming process, ChatGPT is utilized for automating the content analysis,
providing efficient analysis capabilities. It is important to emphasize that ChatGPT’s role
is confined to analyzing the given information through text analysis of the questions and
responses. The authors maintain complete control over the conversation, guiding ChatGPT
using specific prompts tailored to the analysis objectives.
The evaluation of ChatGPT’s responses in this stage is subjective and relies on the
expertise and judgment of the authors. While ChatGPT’s responses offer initial analysis,
the authors critically evaluate and validate the generated content. The collected responses
are then compiled and organized to facilitate structured data exploration, allowing for a
rigorous examination of the insights derived from the extracted information. ChatGPT’s
automated responses will serve as a valuable starting point for further exploration and
examination. By incorporating ChatGPT to automate the content analysis process, the
methodology aims to improve efficiency while preserving the authors’ control and over-
sight. This approach enables a streamlined analysis of the extracted information, leading
to a comprehensive discussion, identifying research gaps, and formulating well-informed
recommendations.

2.3. Case Study Selection


To demonstrate the effectiveness of our suggested SR approach, we have intention-
ally selected the topic of Internet of Things (IoT) applications in water and wastewater
management and water quality monitoring. This topic holds immense significance due
Systems 2023, 11, 351 9 of 37

to the transformative impact of IoT in these domains. However, despite the growing
importance and advancements of IoT technologies, there remains a lack of comprehen-
sive reviews that delve into the intricacies of this specific domain [48–51]. Therefore, our
research aims to contribute to the automation of the SR process by leveraging the power
of ChatGPT to conduct an SR in the context of IoT applications in water and wastewater
management. Furthermore, selecting this case study topic is well-aligned with the authors’
background, facilitating better oversight and validation of ChatGPT’s responses. This
ensures the accuracy and reliability of all generated content.
It is worth noting that our case study concentrates on three specific subtopics within
the broader domain of IoT applications in water and wastewater management: IoT-based
water quality monitoring, IoT-based water infrastructure management, and IoT-based
wastewater infrastructure management. These subtopics have been carefully chosen to
comprehensively cover various aspects and applications of IoT technologies in water and
wastewater management. Moreover, they allow for thorough testing of the proposed
methodology through distinct and specific topics under the overarching theme of IoT
application in infrastructure management. This comprehensive approach contributes to
advancing the potential of ChatGPT as a tool for automating SR and understanding IoT
applications in water and wastewater management.

3. Results and Discussion


This section endeavors to provide a thorough exposition of our methodology im-
plementation within the context of the case study focusing on IoT applications in water
and wastewater management alongside water quality monitoring. Furthermore, we will
offer a detailed assessment of the performance and outcomes achieved by ChatGPT across
various sections.

3.1. Research Words Generation, Article Exrcation, and Keywords Retiveal


Figure 3 showcases the flowchart representing the initial phase of our methodology.
For this study, we directed our attention toward the Scopus database as the primary source
of information. To enhance the quality of responses from ChatGPT, we implemented a
strategy of gradual input of questions. Practically, the training of ChatGPT was initiated by
posing general questions pertaining to the research topic. These initial inquiries served as a
foundation for further exploration and understanding. Subsequently, we transitioned to
more targeted and specific questions, delving into various aspects, such as the definition
of IoT, civil infrastructures, and the intersection of infrastructure management with IoT
applications in water and wastewater management. A compilation of these questions
employed during the initialization phase can be found in Table 1.

Table 1. Examples of the question asked to the ChatGPT to feed the Ai with information about the topic.

ID Question
1 What is the Internet of Things?
2 What are the applications of the IoT so far?
3 What are the requirements to build the IoT system?
4 What are the infrastructures from the Civil engineering perspective?
How can the concept of the IoT be implemented in the domain of water
5
and wastewater management?
What are the academic insights about implementing the IoT in water and
6
wastewater management?

Furthermore, additional questions were posed for a comprehensive understanding


of ChatGPT’s capabilities, and the corresponding responses provided by ChatGPT are
displayed in Figures S1–S7. This gradual approach empowered ChatGPT to generate
well-informed, contextually relevant responses, and increasingly refined as we progressed
through our SR methodology.
Systems2023,
Systems 11, 351
2023, 11, 351 1010
of 37
of 38

Figure 3.
Figure 3. The
The flowchart
flowchart depicts
depictsthetheinitial
initialphase
phaseofofthe
thesystematic
systematic review
review with
with thethe ChatGPT.
ChatGPT. TheThe
flowchart shows three primary steps: (1) the development of Boolean research terms,
flowchart shows three primary steps: (1) the development of Boolean research terms, (2) the ex- (2) the extrac-
tion of relevant
traction research
of relevant articles,
research andand
articles, (3) the
(3) extraction of the
the extraction of most common
the most keywords.
common The perfor-
keywords. The
mance of the ChatGPT was evaluated utilizing conventional, cutting-edge techniques
performance of the ChatGPT was evaluated utilizing conventional, cutting-edge techniques for
for conduct-
ing systematic reviews.
conducting systematic reviews.

Upon completing
Upon completingthetheinitialization
initializationprocess,
process,we
weapprised
apprised ChatGPT
ChatGPT of of
ourour intention
intention to to
conduct anan SRSR focusing
focusingonon“IoT
“IoTapplications
applicationsininwater
water and
and wastewater
wastewater management
management andand
water
water quality
qualitymonitoring”.
monitoring”.Surprisingly,
Surprisingly,ChatGPT
ChatGPT generated
generatedBSTsBSTs
derived fromfrom
derived the Scopus
the Sco-
database, as depicted
pus database, in Figure
as depicted 4a, presenting
in Figure an unexpected
4a, presenting and noteworthy
an unexpected outcome.
and noteworthy out-
This successful generation of BSTs highlights the potential of ChatGPT in assisting
come. This successful generation of BSTs highlights the potential of ChatGPT in assisting with
the
withliterature search search
the literature process.process.
Moving Moving
forward,forward,
we includedwe and excluded
included andarticles
excludedfromarticles
the
database by instructing ChatGPT to generate BSTs that constrained the search
from the database by instructing ChatGPT to generate BSTs that constrained the search to to English-
language journal articles
English-language journaland conference
articles papers published
and conference between 2010
papers published and 2022,
between 2010 as
and
demonstrated
2022, as demonstrated in Figure 4b. Furthermore, Figure 4c shows an additionalensure
in Figure 4b. Furthermore, Figure 4c shows an additional request to request
that the BSTs
to ensure thatencompassed publicationspublications
the BSTs encompassed with the BSTs present
with in their
the BSTs titles,
present inabstracts,
their titles,
or keywords. Following these gradual iterations of refinement, the final set of BSTs was
abstracts, or keywords. Following these gradual iterations of refinement, the final set of
obtained, which are as follows: “TITLE-ABS-KEY((“internet of things” or “IoT”) AND
BSTs was obtained, which are as follows: “TITLE-ABS-KEY((“internet of things” or “IoT”)
(“water” OR “wastewater” OR “sewage” OR “sanitation”) AND (“infrastructure” OR
AND (“water” OR “wastewater” OR “sewage” OR “sanitation”) AND (“infrastructure”
“infrastructures”)) AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “cp”))
OR “infrastructures”)) AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE,
AND (PUBYEAR > 2009 AND PUBYEAR < 2023)”. However, it is essential to note that
“cp”)) AND
despite (PUBYEAR
ChatGPT’s > 2009
assistance AND PUBYEAR
in generating < 2023)”.
the BSTs (refer toHowever, it is
Figure S6), weessential to note
encountered
that despite ChatGPT’s assistance in generating the BSTs (refer to Figure S6), we encoun-
tered inconsistencies in the formatting of references associated with these publications,
study [52] that documented similar issues encountered by ChatGPT models in reference
extraction.

Systems 2023, 11, 351 11 of 37

inconsistencies in the formatting of references associated with these publications, indicating


challenges in the extraction process. These findings corroborate with a previous study [52]
that documented similar issues encountered by ChatGPT models in reference extraction.

(a)

(b)

Figure 4. Cont.
Systems 2023,
Systems 2023, 11,
11, 351
351 12
12 of 38
of 37

(c)

Figure 4.
Figure 4. Response
Response from
from the ChatGPT to
the ChatGPT our request
to our request to
to create
create research
research terms
terms for
for use
use in
in Scopus
Scopus
searches. (a) response with BST, (b) response with BST for the latest 12 years, and (c) response
searches. (a) response with BST, (b) response with BST for the latest 12 years, and (c) response with with
BST for the latest 12 years and include articles and conferences with English language
BST for the latest 12 years and include articles and conferences with English language only. only.

Consequently, we
Consequently, we resorted
resorted to tomanual
manualsearching
searchingononScopus
Scopus in in
order to ensure
order to ensure thethe
ac-
curate retrieval of relevant articles. Table 2 provides examples of
accurate retrieval of relevant articles. Table 2 provides examples of ChatGPT’s responses,ChatGPT’s responses,
illustratingerrors
illustrating errorsininthe
theDOI,
DOI, publication
publication title,
title, or both.
or both. ForFor additional
additional instances
instances of refer-
of references
ences generated
generated by ChatGPT,
by ChatGPT, pleaseplease
refer torefer to Figure
Figure S7. Following
S7. Following the extraction
the extraction of all of all rel-
relevant
evant articles
articles from Scopus,
from Scopus, our focusour focus
shifted shiftedevaluating
towards towards evaluating
the proficiencythe ofproficiency
ChatGPT in of
ChatGPT in
retrieving retrieving
keywords as keywords
part of theas SRpart of theTo
process. SRassess
process.
this,To weassess
assignedthis,ChatGPT
we assigned the
ChatGPT
task the task of
of identifying theidentifying the top used
top 50 frequently 50 frequently
keywords used
based keywords based
on the BSTs on the BSTs
employed for
employed for
publication publication
extraction, extraction,
as illustrated as illustrated
in Figure in Figure 5.of The
5. The effectiveness effectiveness
ChatGPT’s keyword of
ChatGPT’swas
extraction keyword extractionthrough
then evaluated was then evaluated through
a comparative analysis a comparative
with VOSviewer analysis with
software
(1.6.19),
VOSviewer a widely used (1.6.19),
software tool for visualizing
a widely used and analyzing bibliographic
tool for visualizing anddata. By comparing
analyzing biblio-
the keywords
graphic data. Byextracted
comparing by ChatGPT
the keywordswith extracted
those obtained from VOSviewer,
by ChatGPT we sought
with those obtained fromto
assess the degree
VOSviewer, of overlap
we sought and potential
to assess the degree differences in the
of overlap andextracted
potentialkeywords.
differences in the
Tablekeywords.
extracted 3 presents the similarity percentage between the keywords obtained from Chat-
GPT Table
and VOSviewer
3 presents for thedifferent
similarity numbers
percentage of keywords
betweenconsidered.
the keywords Thisobtained
comparativefrom
analysis
ChatGPTallowed us to gauge
and VOSviewer the levelnumbers
for different of agreement between
of keywords ChatGPT’sThis
considered. keyword
compara- ex-
traction and the
tive analysis resultsusgenerated
allowed to gaugeby theVOSviewer. While our
level of agreement findings
between indicated keyword
ChatGPT’s a certain
level of agreement
extraction between
and the results the keywords
generated extracted While
by VOSviewer. by ChatGPT and those
our findings obtained
indicated from
a certain
VOSviewer, we also observed some notable differences (refer to
level of agreement between the keywords extracted by ChatGPT and those obtained from Table 3). Specific unique
keywords
VOSviewer, surfaced
we alsoinobserved
VOSviewer somethat ChatGPT
notable did not(refer
differences identify, and 3).
to Table vice versa.unique
Specific These
differences showed the poor performance of AI-powered keyword
keywords surfaced in VOSviewer that ChatGPT did not identify, and vice versa. These extraction methods
compared
differencestoshowed
traditional software
the poor tools. The
performance of presence
AI-powered of unique
keyword keywords
extractionexclusively
methods
identified
comparedby toVOSviewer
traditional suggests
softwarethat ChatGPT
tools. achieved
The presence ofpartial
unique success in extracting
keywords exclusivelythe
Systems 2023, 11, 351 13 of 37

keywords. Therefore, it is recommended to rely on alternative methods, such as Scoups or


VOSviewer, for a more reliable approach. Such differences in the extracted keywords can
be attributed to factors such as the training data, biases, and algorithmic limitations, which
can impact the effectiveness and accuracy of AI-powered keyword extraction. Further
research in this area would shed light on the strengths and weaknesses of AI models such
as ChatGPT and inform the future refinement and improvement of keyword extraction
techniques. The compilation and summary of the unique keywords obtained from both
ChatGPT and VOSviewer are provided in Table S1, providing a comprehensive overview
of the extracted terms from different perspectives.

Table 2. Examples of the references extracted from the ChatGPT and the evaluation of the correctness
for both title and DOIs.

No. ChatGPT Reference Doi Check Title Check


AlFuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M.,
& Ayyash, M. (2015). Internet of things: A survey on
1 enabling technologies, protocols, and applications. IEEE Correct Correct
Communications Surveys & Tutorials, 17(4), 2347-2376.
https://doi.org/10.1109/COMST.2015.2444095
Akbar, M., & Khan, M. S. (2018). IoT-based smart
wastewater management system. International Journal of DOI not
2 Correct
Distributed Sensor Networks, 14(6), 1550147718781026. found
https://doi.org/10.1177/1550147718781026
Arumugam, S., & Selvamani, S. (2020). Internet of things
(IoT) based smart water management system: A review.
3 Journal of Ambient Intelligence and Humanized Incorrect Correct
Computing, 11(10), 4369-4390.
https://doi.org/10.1007/s12652-020-02770-w
Dong, J., Li, Y., & Li, X. (2019). A review of internet of
things-based smart water management systems. Journal of DOI not
4 Incorrect
Sensors, 2019, 1-11. found
https://doi.org/10.1155/2019/2567391
Kaur, P., Kumar, M., & Singh, P. (2021). IoT-enabled water
management: A review. In I. Ahmad, S. Shafi, S. S. Gill, &
V. Chang (Eds.), Internet of things and big data analytics DOI not
5 Incorrect
towards next-generation intelligence found
(pp. 381-398). Springer.
https://doi.org/10.1007/978-981-33-6965-5_17
Kumar, M., Kumar, V., & Al-Fuqaha, A. (2021). An
overview of cyber-physical system-based water
DOI not
6 management in smart cities. Journal of Sensor and Incorrect
found
Actuator Networks, 10(2), 19.
https://doi.org/10.3390/jsan10020019

Table 3. The similarity percentage between the keywords from ChatGPT and VOSviewer.

Number of Unique Keywords


ChatGPT VOS Viewer Similarity (%)
from ChatGPT
50 50 20 40
100 100 28 72
180 180 23 138
200 200 21 158
50 263 68 16
Systems 2023, 11, 351 14 of 3

Systems 2023, 11, 351 enhance the depth and comprehensiveness of the SR process. In the next 14 phase,
of 37 we wi

explore how ChatGPT can filter and categorize the articles extracted in phase one.

Figure 5. User prompt asking the ChatGPT to retrieve the top 50 keywords and the ChatGPT’s r
Figure 5. User prompt asking the ChatGPT to retrieve the top 50 keywords and the ChatGPT’s
sponse in tabular format.
response in tabular format.

Table 3. The similarity


In summary, percentage
the initial phase ofbetween the keywords
our methodology fromthe
revealed ChatGPT and VOSviewer.
considerable capability
of ChatGPT in generating pertinent BSTs for retrieving relevant articles and the limited
capabilities of extracting
Number of Unique Keywords from
ChatGPT VOSkeywords.
Viewer Regrettably, ChatGPT
Similarity (%) was unable to extract relevant
articles without human guidance autonomously. These preliminary findings ChatGPT
lay a solid
50 for the subsequent
foundation 50 stages of our methodology,
20 which will primarily 40concentrate
on the 100
accurate filtering and100categorization of the
28extracted articles in order to enhance
72 the
depth and comprehensiveness of the SR process. In the next phase, we will explore how
180 180 23 138
ChatGPT can filter and categorize the articles extracted in phase one.
200 200 21 158
3.2. First-Round
50 Article Classification
263 and Filtration
68(Title and Abstract) 16
A total of 496 English language journal articles and conference proceedings relevant
to3.2.
theFirst-Round
research topic wereClassification
Article retrieved fromandtheFiltration
Scopus database
(Title andusing BSTs suggested by
Abstract)
ChatGPT. Figure 6 showcases the flow chart of filtrating and categorizing articles in the first
A total of 496 English language journal articles and conference proceedings relevan
part and extract filtrating and information extraction from related articles in the second part.
to the research
Initially, topic were
we identified retrieved
three from the
broad categories of Scopus database
interest based using
on our BSTs suggested b
comprehensive
analysis of research trends in the field: IoT-based water infrastructure management,articles
ChatGPT. Figure 6 showcases the flow chart of filtrating and categorizing IoT- in th
first part
based and extract
wastewater filtratingmanagement,
infrastructure and information extraction
and IoT-based fromquality
water related articles in the se
monitoring.
ond part.
These categories were selected to encompass the key focus areas in our research and ensure
that the filtration process targeted the most relevant articles within these specific domains.
Systems 2023, 11,
Systems 2023, 11, 351
351 15 of
15 of 37
38

Figure 6. Flow chart of the first and second phases of the filtration process. The figure depicts the
Figure 6. Flow chart of the first and second phases of the filtration process. The figure depicts the de-
details of phase 1 of article filtration and phase 2 of information extraction and sub-categories gen-
tails of phase 1 of article filtration and phase 2 of information extraction and sub-categories generation.
eration.
Initially, we identified three broad categories of interest based on our comprehensive
analysis of research trends in the field: IoT-based water infrastructure management, IoT-
based wastewater infrastructure management, and IoT-based water quality monitoring.
These categories were selected to encompass the key focus areas in our research and en-
Systems 2023, 11, 351
sure that the filtration process targeted the most relevant articles within these specific do-
16 of 37
mains.
To better elaborate on the capabilities of ChatGPT, we transformed the task into a
classification problem, where ChatGPT was asked to assign articles to one of four distinct
To better elaborate on the capabilities of ChatGPT, we transformed the task into a
categories: water management, wastewater management, water quality, or unrelated. To
classification problem, where ChatGPT was asked to assign articles to one of four distinct
facilitate this classification
categories: water process,
management, we requested
wastewater ChatGPTwater
management, to generate
quality,definitions forTo
or unrelated. each
offacilitate
the fourthis
categories, as depicted in Figure 7. ChatGPT responded by generating precise
classification process, we requested ChatGPT to generate definitions for each
definitions
of the fourfor each category,
categories, which
as depicted would7.subsequently
in Figure serve asby
ChatGPT responded guiding principles
generating precisefor
categorizing
definitions forarticles (see Figure
each category, 7). would
which By incorporating these
subsequently guidelines,
serve as guidingwe aimed to
principles foren-
hance the accuracy and consistency of ChatGPT’s classification outputs, thus optimizing
categorizing articles (see Figure 7). By incorporating these guidelines, we aimed to enhance
the
thesubsequent
accuracy and stages of our methodology.
consistency of ChatGPT’s classification outputs, thus optimizing the
subsequent stages of our methodology.

Figure 7. User prompt asking the ChatGPT about its information about the three main categorizes.
Figure 7. User prompt asking the ChatGPT about its information about the three main categorizes.

We
Weevaluated
evaluated the classification/discarding
classification/discardingperformance
performanceofofChatGPT
ChatGPT in in
twotwo distinct
distinct
scenarios
scenariosby bycomparing
comparingthe theperformance
performancetotothe thehuman
humanexperts’
experts’evaluations.
evaluations.This task
This was
task
executed by carefully
was executed crafting
by carefully prompts
crafting for ChatGPT
prompts for ChatGPTandand
ensuring thatthat
ensuring each
eachprompt
prompt con-
contained
tained 10 articles/time
10 articles/time and and
APAAPA references.
references. By limiting
By limiting the the number
number of articles
of articles in in
each
each prompt, we aimed to balance information comprehensiveness and manageable
prompt, we aimed to balance information comprehensiveness and manageable input sizes input
sizes
for for ChatGPT.
ChatGPT. Moreover,
Moreover, we imposed
we imposed specific
specific constraints
constraints during during the classification
the classification process
process to maintain consistency and control. These constraints encompassed
to maintain consistency and control. These constraints encompassed categorizing categorizing
articles
articles exclusively into the predefined four categories, refraining from making
exclusively into the predefined four categories, refraining from making assumptions, fo- assumptions,
focusing on articles directly related to the three main categories of interest, and presenting
cusing on articles directly related to the three main categories of interest, and presenting
the classification results in a structured tabular format.
the classification results in a structured tabular format.
Upon preparing the prompts, ChatGPT generated responses that included the clas-
Upon preparing the prompts, ChatGPT generated responses that included the clas-
sification output in a visually organized table (Figure 8). Within this table, “x” markings
sification
indicatedoutput in a visually
the assigned organized
category for each table (Figure
article, while 8). Within this table,
accompanying “x” markings
explanations pro-
indicated the assigned
vided insights category decision-making
into the underlying for each article,process
whileemployed
accompanying
by ChatGPT explanations
(refer
to Figure 8). This comprehensive representation facilitated the interpretation of ChatGPT’s
classification outcomes and allowed for a deeper understanding of the rationale behind
each categorization.
Systems 2023, 11, 351 17 of 38

provided insights into the underlying decision-making process employed by ChatGPT


(refer to Figure 8). This comprehensive representation facilitated the interpretation of
Systems 2023, 11, 351
ChatGPT’s classification outcomes and allowed for a deeper understanding of the 17 ra-of 37
tionale behind each categorization.

Figure 8. APA-style article filtration procedure (feeding rate 5 articles per time). (a) The prompt for
Figure 8. APA-style article filtration procedure (feeding rate 5 articles per time). (a) The prompt for
the user. (b) The response of the ChatGPT to the requirement. The ChatGPT presents the answers
the user. (b) The response of the ChatGPT to the requirement. The ChatGPT presents the answers in
in a tabular format with an “x” next to the corresponding category. The ChatGPT explains the deci-
a tabular
sion format
beneath with an “x” next to the corresponding category. The ChatGPT explains the decision
the table.
beneath the table.
To assess the classification and the discarding of articles, we carefully selected a sub-
set ofTo assess
120 the classification
articles, and the discarding
comprising approximately 25% of of the
articles,
total we carefully
articles (496),selected a subset
representing
all four categories. We then organized the titles and abstracts of these articles and shared all
of 120 articles, comprising approximately 25% of the total articles (496), representing
four categories.
them We then
with the experts organized
using Google theFormstitles
toand abstracts
facilitate of these articles
the management andevaluation
of the shared them
with the experts using Google Forms to facilitate the management of
process. A sample of the questions, including the article’s title and abstract, illustrating the evaluation process.
the format used in the questionnaire is attached in Figure S8. We provided the article format
A sample of the questions, including the article’s title and abstract, illustrating the title
usedabstract
and in the questionnaire
as this is theisfollowed
attachedmethod
in Figure inS8.
theWe provideddiscarding
traditional the article process
title andofabstract
the
as this isTo
articles. theflexibly
followed method
account forinarticles
the traditional
that maydiscarding process
cover multiple of the articles.
categories, To flexibly
we permitted
account fortoarticles
volunteers select that may cover
a maximum multiple
of two categories,
categories, but notweone
permitted
of themvolunteers to select a
the” not related”
maximum
for of two
the selected categories,
articles. but not one
This approach of them the”the
acknowledged not related” for
complexity the selected
of some articles,articles.
en-
suring that they were not constrained to a single classification. The volunteers’ responsesnot
This approach acknowledged the complexity of some articles, ensuring that they were
constrained
were to a singleinto
then converted classification.
a numericalThe volunteers’
scale, where the responses were then
four predefined converted
categories into a
were
represented by the numbers 1, 2, 3, and 4, making quantitative analysis and comparison 1,
numerical scale, where the four predefined categories were represented by the numbers
2, 3, and 4, making quantitative analysis and comparison easier.
easier.
Aftereliminating
After eliminatingincorrect
incorrect raters
raters based
based on Cohen’s
on Cohen’s Kappa Kappa coefficient
coefficient values,values,
we em-we
employed the majority vote approach to determine the final category
ployed the majority vote approach to determine the final category for each article. This for each article. This
consensus-basedclassification
consensus-based classificationwas wasthen
thenusedusedasasa abenchmark
benchmark to to evaluate
evaluate thethe filtration
filtration
process of
process of ChatGPT.
ChatGPT.Table TableS2S2provides
providesa adetailed
detailed breakdown
breakdown of of
thethe
final categories
final categoriesassigned
as-
to the articles
signed based on
to the articles the on
based majority vote ofvote
the majority the ofvolunteers.
the volunteers.
Figure 9(a1)
Figure 9(a1)shows
showsthe theconfusion
confusionmatrix matrixofofthethecomparison
comparison between
between thethe benchmark
benchmark
(true classifications) and the classifications from ChatGPT (APA). By analyzing thefindings,
(true classifications) and the classifications from ChatGPT (APA). By analyzing the find-
we observed
ings, we observedthat that
the “not related”
the “not class
related” achieved
class achieved a promising
a promisingaccuracy
accuracyofof78.00%,
78.00%, an
F1-score
an F1-score of of
81.00%,
81.00%,andanda arecall
recallof of80.00%.
80.00%. This Thisindicates
indicatesthat thatChatGPT
ChatGPT demonstrated
demonstrated
effective performance
effective performancein inremoving
removingirrelevant
irrelevantarticles.
articles. However,
However, forfor
thethe remaining
remaining classes,
classes,
F1-scoreswere
the F1-scores werelower
lowerthan
than80%.80%.These
Theselower
lower accuracies
accuracies were
were expected,
expected, since
since ChatGPT
ChatGPT
solely on
relied solely on APA
APA information
informationfor forclassification.
classification.
The generation
The generation of of the
theconfusion
confusionmatrix matrixprovided
provideda acomprehensive
comprehensive evaluation
evaluation of of
ChatGPT’s performance.
ChatGPT’s performance.While Whileemploying
employing ChatGPT
ChatGPT (APA)
(APA) in in
thethe classification
classification process
process
exhibited promising results in filtering out irrelevant articles, there is room for improvement
in its classification accuracy for other categories. It is worth mentioning that the sole
dependence on the APA information to filter was an intentional choice aimed at assessing
ChatGPT’s performance at different stages and input levels, even though it deviated
from conventional methods. However, recognizing the potential limitations of relying
solely on APA information, we sought to improve the accuracy of the filtering process by
the token limit would require truncating or omitting input parts, potentially losing im-
portant information. Therefore, we limited the number of articles in each prompt to five
per time. This decision was made considering the average token length of APA infor-
mation and article abstracts and not to confuse the ChatGPT model. By incorporating ar-
ticle abstracts into the classification process, we aimed to address the potential limitations
Systems 2023, 11, 351 of relying solely on APA information. Abstracts often provide a concise summary of18an of 37
article, offering valuable contextual cues that can aid in accurate classification. Figure 10
provides a visual representation of the process, illustrating how ChatGPT was fed with
prompts containing
incorporating both APAThis
article abstracts. and modified
abstract information, and it showcases
approach, ChatGPT the system’s
(APA + Abstract), aimed
classification responses.
to leverage both the APA and abstract information to enhance the system’s performance.

Figure 9. The confusion matrix comparing the classification of the articles by experts and the
Figure The and
9. (a1)
ChatGPT. confusion matrix
(b1) display comparing
confusion the classification
matrices, while the (a2)ofand
the(b2)
articles
depictby
theexperts and the
performance
ChatGPT. (a1,b1) display confusion
metrics of categorization process. matrices, while the (a2,b2) depict the performance metrics of
categorization process.
It can be observed that the classification process conducted by ChatGPT (APA + Ab-
To occasionally
starct) implement the classification
results processtwo
in assigning of the articles using
categories for a the ChatGPT
single article.(APA + Abstract)
While this is
approach, we obtained the APA and abstract information of the articles from
deemed acceptable when the two categories do not include the “Not related” category, Scopus in a
CSV file format. This allowed us to gather the necessary data for creating prompts that could
be fed into ChatGPT. However, it is crucial to consider that the performance of ChatGPT
models is mainly constrained by token length and capacity [53]. Each token represents a
text unit, such as a word or character. The maximum token limit for ChatGPT models is a
crucial factor to consider when designing prompts. Exceeding the token limit would require
truncating or omitting input parts, potentially losing important information. Therefore, we
limited the number of articles in each prompt to five per time. This decision was made
considering the average token length of APA information and article abstracts and not
to confuse the ChatGPT model. By incorporating article abstracts into the classification
process, we aimed to address the potential limitations of relying solely on APA information.
Abstracts often provide a concise summary of an article, offering valuable contextual cues
that can aid in accurate classification. Figure 10 provides a visual representation of the
process, illustrating how ChatGPT was fed with prompts containing both APA and abstract
information, and it showcases the system’s classification responses.
It can be observed that the classification process conducted by ChatGPT (APA + Abstarct)
occasionally results in assigning two categories for a single article. While this is deemed
acceptable when the two categories do not include the “Not related” category, indicating
that the article covers distinct topics, complications arise when an article is classified as
both relevant and “Not related”. This situation can pose challenges for users, particularly
due to the criticality of accurately including or excluding articles in the SR process.
which is “Not related”, we leveraged the explanations provided by ChatGPT to assist in
confirming decisions regarding article inclusion or exclusion. Practically, we collected the
articles that ChatGPT assigned two categories and re-requested their classification. How-
ever, this time, we provided ChatGPT with the explanations accompanying its initial clas-
sifications. In practical applications, we recommended reading the justification provided
Systems 2023, 11, 351 19 of 37
by the ChatGPT for the articles classified into two classes to confirm the relevance of the
article or not.

Figure
Figure10.
10. (a) An illustration
(a) An illustrationofofChatGPT
ChatGPTinput
input utilizing
utilizing APA
APA metadata
metadata andand the abstract.
the abstract. (b)
(b) Chat-
ChatGPT’s response to the request. ChatGPT classified the article as both unrelated and in the water
GPT’s response to the request. ChatGPT classified the article as both unrelated and in the water
quality category. Nonetheless, reviewing the explanation from the user’s perspective would aid in
quality category. Nonetheless, reviewing the explanation from the user’s perspective would aid in
determining that the article is unrelated.
determining that the article is unrelated.
Similarly, we evaluated the performance of the classification from ChatGPT (APA +
Notably, ChatGPT occasionally tends to retain articles to the maximum extent, even if
Abstract) by comparing ChatGPT’s results (APA + Abstract) to our benchmark, which
they are unrelated, by assigning them to the closest corresponding category. Figure 10 pro-
consisted of the opinions of experts. This evaluation aimed to assess the efficacy of the
vides an illustration of an article being classified into two categories, with one of them being
filtration process, particularly in relation to the “Not related” class (Figure 9(b1)). The re-
“Not related”. Alongside the classification outputs, ChatGPT also provides justifications
sults showed
for its significant
selections, which improvement when applying
are pivotal in informing ChatGPT (APA process.
the decision-making + Abstract) com-
ChatGPT
pared to ChatGPT (APA) alone. Regarding precision, recall, and F1-score, the
provides insights into the factors and reasoning underlying its decisions by explainingChatGPT
(APA + Abstract) achieved
its classifications. impressive
This justification valuesserves
feature for theas“Not related”
a valuable class,
tool with scoresand
for evaluating of
validating the appropriateness of the classification decisions.
To address the challenge posed by articles being classified into two categories, one
of which is “Not related”, we leveraged the explanations provided by ChatGPT to assist
in confirming decisions regarding article inclusion or exclusion. Practically, we collected
the articles that ChatGPT assigned two categories and re-requested their classification.
However, this time, we provided ChatGPT with the explanations accompanying its initial
classifications. In practical applications, we recommended reading the justification pro-
vided by the ChatGPT for the articles classified into two classes to confirm the relevance of
the article or not.
Similarly, we evaluated the performance of the classification from ChatGPT (APA + Abstract)
by comparing ChatGPT’s results (APA + Abstract) to our benchmark, which consisted of
the opinions of experts. This evaluation aimed to assess the efficacy of the filtration process,
particularly in relation to the “Not related” class (Figure 9(b1)). The results showed signifi-
cant improvement when applying ChatGPT (APA + Abstract) compared to ChatGPT (APA)
alone. Regarding precision, recall, and F1-score, the ChatGPT (APA + Abstract) achieved
impressive values for the “Not related” class, with scores of 85.00%, 93.00%, and 90.00%,
respectively. These metrics outperformed the corresponding scores obtained by ChatGPT
(APA) (Figure 9(b2)). Furthermore, the F1-scores for the three other classes, namely, wa-
ter management, wastewater management, and water quality, were also notably higher,
with scores of 91.00%, 87.00%, and 86.00%, respectively. The implementation of ChatGPT
(APA + Abstract) led to a reduction in misclassification rates of approximately 64% com-
Systems 2023, 11, 351 20 of 37

pared to ChatGPT (APA), demonstrating its capacity for improved accuracy. Additionally,
other evaluation measures, such as accuracy, macro-F1, and weighted F1, experienced
enhancements. These findings collectively underscore the exceptional performance of
ChatGPT (APA + Abstract) in effectively filtering and categorizing articles, positioning it as
a valuable tool for subsequent classification and article exclusion with enhanced precision.
However, it is important to acknowledge that certain limitations remain, particularly
regarding the number of articles that can be filtered simultaneously. While ChatGPT
exhibits remarkable capabilities, practical constraints need to be considered when scaling
up its application. This evaluation provides valuable insights into the effectiveness and
potential of ChatGPT (APA + Abstract) as a robust classification system, offering improved
precision and reliability in filtering and categorizing scientific articles. By combining AI-
driven classification strengths with human evaluators’ expertise, we can harness the power
of automation while ensuring the highest standards of accuracy and relevance.
Despite the limitation on the feeding rate of articles into ChatGPT, it continues to
surpass traditional filtering methods in terms of time efficiency. The performance of Chat-
GPT (APA + Abstract) in article filtering is considered outstanding. Therefore, ChatGPT
(APA + Abstract) was utilized to screen all articles within the study. The comprehensive
results of the filtering and categorizing of all articles can be found in Tables S3–S6. It is
important to note that the output of this step goes beyond the elimination of articles; it
also involves categorizing relevant articles into three main classes. Following the filtra-
tion process, a total of 351 articles were discarded as they were deemed irrelevant, while
145 articles were retained as relevant. The relevant articles were categorized into specific
domains, with 76 articles on water management, 53 on wastewater management, and 32 on
water quality. It is important to acknowledge that specific articles may overlap and fall
into multiple categories, resulting in 161 articles across the three domains. However, when
considering unique articles, the total count stands at 145.
Ultimately, the utilization of ChatGPT (APA + Abstract) in the filtration and catego-
rization process demonstrates its effectiveness in efficiently managing a large volume of
articles, streamlining the identification of relevant content, and facilitating the organization
of articles based on their thematic relevance. By leveraging the capabilities of AI-powered
classification, researchers can optimize their workflow, allocate their time more effectively,
and enhance the accuracy and precision of their literature review processes.

3.3. Second-Round Article Filtration (Full-Text) and Information Extraction


The full-text filtration and information extraction phase was carried out during a
second round of article filtration to evaluate the suitability of the remaining 145 articles
for inclusion in our review and extract valuable information from them. This challenge
was addressed by utilizing the capabilities of ChatGPT for full-text filtering, as illustrated
in Figure 6. To effectively leverage ChatGPT for this purpose, we initially identified five
sub-categories within each main category to concentrate on specific areas of interest and
ensure a comprehensive exploration of the diverse topics relevant to our review. These
sub-categories were thoughtfully selected to cover diverse aspects of the subject matter,
including sensors and sensing technology, data acquisition and transmission, data analytics
and visualization, applications, case studies, and research gaps and trends.
To automate extracting information and harness the capabilities of ChatGPT, we de-
vised a prompt that solicited ChatGPTs’ recommendations for relevant questions pertaining
to each sub-category. The response generated by ChatGPT to this request is depicted in
Figure 11, while Figure 12 showcases the 14 questions that were generated belonging to the
five sub-categories. It is important to note that these questions generated are of a general
nature and elicit responses in the form of “yes” or “no”. The answers to these questions by
ChatGPT would help extract information from the articles and remove irrelevant articles.
In this phase, we tested the performance of ChatGPT in two scenarios, including ChatGPT
(APA) and ChatGPT (APA + Abstract + relevant information). Practically, the ChatGPT
prompts were constructed using the article’s APA, abstract, methodology, discussion, and
Systems 2023, 11, 351 21 of 37

occasionally the conclusions section. Due to the extended length of these extracted sections
from the articles compared to the previous steps (i.e., abstract only), the ChatGPT prompts
Systems 2023, 11, 351 21 of 38
were designed to handle one article at a time.
However, as previously discussed, the prompt’s length is carefully adjusted to balance
obtaining reliable responses from ChatGPT and saving time. It is worth noting that the time
to the five
invested sub-categories.
in this It is important
step is considerably less thantothenote
timethat these questions
of manual execution,generated
particularlyare of a
general nature
considering and elicit
the added benefitresponses in theextraction
of information form of alongside
“yes” or “no”. The answers
the article’s filtration.to these
During by
questions theChatGPT
assessment of ChatGPT’s
would responses
help extract to the 14from
information questions, we observed
the articles and removethree irrel-
distinct
evant scenarios.
articles. InFirstly, when we
this phase, the answers
tested theto performance
a question were of “yes,” ChatGPT
ChatGPT in twoconfirmed
scenarios, in-
this affirmative
cluding ChatGPTresponse
(APA) andand provided
ChatGPT relevant
(APAinformation
+ Abstract +from the article
relevant that corre-
information). Practi-
sponded
cally, the ChatGPT prompts were constructed using the article’s APA, abstract,were
to the question (refer to Figure 13). Secondly, in instances where the answers method-
“No”, ChatGPT simply reported “No” without furnishing any straightforward answers
ology, discussion, and occasionally the conclusions section. Due to the extended length of
derived from the article (as shown in Figure 14). Lastly, when ChatGPT determined that
these extracted sections from the articles compared to the previous steps (i.e., abstract
the majority of answers were “No”, it classified the paper as “unrelated” (as shown in
only), the ChatGPT prompts were designed to handle one article at a time.
Figure S9).

Figure
Figure 11.11.
TheThe ChatGPT’s
ChatGPT’s response
response to our
to our request
request for proposing
for proposing research
research questions
questions that fitthat
intofiteach
into each
class. There are 14 questions in
class. There are 14 questions in all.
all.

However, as previously discussed, the prompt’s length is carefully adjusted to bal-


ance obtaining reliable responses from ChatGPT and saving time. It is worth noting that
the time invested in this step is considerably less than the time of manual execution, par-
ticularly considering the added benefit of information extraction alongside the article’s
filtration.
Systems2023,
Systems 2023,11,
11,351
351 22 22
of of
3837

Figure 12.
Figure 12. Our systematic
systematic review
review taxonomy.
taxonomy. The
Thefirst
firstlevel
levelrepresents
representsthe thethree
threecategories
categoriesofofthe
the
review,the
review, thesecond
secondlevel
leveldepicts
depictsthe
the sub-categories,
sub-categories, and
and thethe third
third level
level illustrates
illustrates questions
questions to aid
to aid with
with information extraction. The 14 questions and five sub-categories are identical for each main
information extraction. The 14 questions and five sub-categories are identical for each main category.
category.
In this phase, we evaluated ChatGPT’s performance by comparing its responses to
During
individual the assessment
articles (we selected of one
ChatGPT’s responses
article known to the
for the 14 questions,
authors we observed
as an example). Initially,
three distinct scenarios. Firstly, when the answers to a question
we asked ChatGPT to answer these questions based on the article’s APA information. were “yes,” ChatGPT con-
firmed this
However, asaffirmative
demonstrated response
in Figureand14,provided relevant provided
where ChatGPT information from the
incorrect article that
responses, APA
corresponded to the question (refer to Figure 13). Secondly, in
information proved to be inadequate. For example, in Answer 1-1, ChatGPT mistakenly instances where the an-
swers were
claimed that“No”, ChatGPT
the author usedsimply reported
the wrong type“No” without
of sensors, furnishing
and in Answer any4-1,straightfor-
ChatGPT
ward answers derived from the article (as shown in Figure 14).
inaccurately identified the research location as Saudi Arabia instead of Hong Kong. Lastly, when ChatGPT
determined that the
To improve the accuracy
majority ofof answers
ChatGPT’s were “No”, it we
responses, classified the paper
supplemented itsasunderstanding
“unrelated”
(as shown in Figure S9).
by incorporating additional information from the articles themselves. We considered
variousIn sections,
this phase, we evaluated
including ChatGPT’s
the titles, abstracts,performance
methodology bydescriptions,
comparing its responses
relevant partstoof
individual
the results, articles (we selected
and conclusions, asone article
these known
sections for provided
often the authors as andetailed
more example). andInitially,
context-
we asked ChatGPT to answer these questions based on the article’s
rich information compared to abstracts alone. However, we intentionally excluded article APA information.
However, as demonstrated
introductions and related work in Figure
sections 14,towhere
maintain ChatGPT
clarity provided
and avoidincorrect
confusion. responses,
Figure 15
APA information proved to be inadequate. For example, in Answer
provides an example of a ChatGPT prompt with a title, abstract, methodology description, 1-1, ChatGPT mistak-
enly claimed that the author used the wrong type of sensors, and
and ChatGPT’s response to the questions. In this example, we used the same article asin Answer 4-1, ChatGPT
inaccurately
before, and it identified
is evident the
thatresearch location
the quality as Saudi responses
of ChatGPT’s Arabia instead of Hong Kong.
has significantly improved.
For instance, in Answer 1-1, ChatGPT accurately reported the use of 58 ultrasonic sensors,
and in Answer 4-1, ChatGPT correctly identified the research area’s location.
Systems 2023, 11, 351 23 of 37
Systems 2023, 11, 351 23 of 38

Figure 13. Illustration of a ChatGPT question-answer request prompt. The sole input was the APA
Figure
article13. Illustration
format. The leftofpanel
a ChatGPT
displays question-answer request prompt.
the ChatGPT’s responses to theseThe sole input
questions was
in the the APA
required
article format. The left panel displays the ChatGPT’s responses to these questions in the
tabular format. The dots indicate that a portion of the questions and answers were displayed, as therequired
tabular
completeformat.
promptTheand
dots indicate
answers arethat
tooalong
portion
to beofpresented.
the questions and answers were displayed, as the
complete prompt and answers are too long to be presented.
To improve the accuracy of ChatGPT’s responses, we supplemented its understand-
ing At
by this stage, it can
incorporating be concluded
additional that byfrom
information refining the prompt
the articles and incorporating
themselves. We considered addi-
tional
variousarticle information,
sections, includingwe theenhanced the accuracy
titles, abstracts, of ChatGPT’s
methodology responses
descriptions, during
relevant partsthe
information extraction
of the results, phase. This
and conclusions, iterative
as these process
sections oftenallowed
provided usmore
to leverage
detailedtheandstrengths
context- of
ChatGPT while ensuring
rich information compared thetoreliability
abstracts and validity
alone. of the
However, weextracted information.
intentionally excludedNonethe-
article
introductions
less, and related
human oversight and work
criticalsections to maintain
evaluation remainedclarity and avoid
essential confusion.
to validate Figure
and interpret
15 results
the provides an example
obtained fromofChatGPT.
a ChatGPT prompt with a title, abstract, methodology descrip-
tion,Toand ChatGPT’s
overcome the response
limitationtoofthethequestions.
subjective Inevaluation,
this example, wewe used the sameanswered
collaboratively article
as 14
the before, and itfor
questions is evident
a subsetthat
of 30 the qualityalong
articles, of ChatGPT’s
with our responses
responses has significantly
to ChatGPT’s im-
outputs.
proved. For instance,
Remarkably, in Answer
despite the expected 1-1,total
ChatGPT
of 420accurately
individualreported
answers the use
for of
the5814
ultrasonic
questions
sensors,
and and in Answer
30 articles, 4-1, ChatGPT
our answers correctly responses
and ChatGPT’s identified the research to
amounted area’s
381,location.
owing to the
At this stage, it can be concluded that by refining the prompt
classification of 3 articles as irrelevant. The summarized outcomes of these responses and incorporating ad-are
ditional article information, we enhanced the accuracy of ChatGPT’s
presented in Figure 15, while more details about the answers can be found in Table responses during theS7.
information extraction phase. This iterative process allowed us
Among the 381 obtained responses, ChatGPT accurately captured 371, resulting in an to leverage the strengths
of ChatGPT
impressive while ensuring
similarity the reliability
rate exceeding 97%.and validity of the extracted information. None-
theless, human oversight and critical evaluation remained essential to validate and inter-
pret the results obtained from ChatGPT.
Systems 2023, 11, 351 24 of 37
Systems 2023, 11, 351 24 of 38

Figure 14. Illustration of a ChatGPT question-answer request prompt. The sole input was the article
Figure 14. Illustration
titles, abstracts, of a ChatGPT
and methods sectionquestion-answer
portions. The leftrequest prompt.the
panel displays TheChatGPT’s
sole input responses
was the article
to
titles, abstracts, and methods section portions. The left panel displays the ChatGPT’s responses
these questions in the required tabular format. The dots indicate that a portion of the questions and to
answers
Systems 2023, 11,these were displayed,
351 questions as the complete
in the required promptThe
tabular format. anddots
answers are that
indicate too long to be of
a portion presented.
the questions and 25 o
answers were displayed, as the complete prompt and answers are too long to be presented.
To overcome the limitation of the subjective evaluation, we collaboratively answered
the 14 questions for a subset of 30 articles, along with our responses to ChatGPT’s outputs.
Remarkably, despite the expected total of 420 individual answers for the 14 questions and
30 articles, our answers and ChatGPT’s responses amounted to 381, owing to the classifi-
cation of 3 articles as irrelevant. The summarized outcomes of these responses are pre-
sented in Figure 15, while more details about the answers can be found in Table S7.
Among the 381 obtained responses, ChatGPT accurately captured 371, resulting in an im-
pressive similarity rate exceeding 97%.
Regarding discarding articles, both ChatGPT and the authors agreed on the same
articles. However, it is worth noting that ChatGPT’s responses were completely different
for unrelated articles, and it stopped responding to questions (Please refer to Figure S9).
This substantial level of agreement underscores the efficacy of ChatGPT in effectively
comprehending and extracting information from the articles. Upon evaluating the efficacy
of this approach in filtering the initial set of 145 articles, we successfully identified 56 ar-
ticles as irrelevant, enabling us to focus on extracting pertinent information from the re-
maining 86 articles. This demonstrates the valuable role of ChatGPT in streamlining the
article filtration process and automating information extraction from a substantial number
of articles.
Figure 15. A comparison of the ChatGPT’s response to the authors’ general response for the 30 a
Figure 15. A comparison of the ChatGPT’s response to the authors’ general response for the 30 articles
cles in the sample.
in the sample.
Since the snowballing process is an integral part of conducting an SR, we employ
both backward and forward snowballing techniques to uncover additional relevant stu
ies that might have been overlooked during the initial database search [24]. The backwa
snowballing method involves scrutinizing the references of the included papers to id
tify related articles, while the forward snowballing technique entails searching for stud
among the articles that cited the included ones [24]. We manually conducted the sno
Systems 2023, 11, 351 25 of 37

Regarding discarding articles, both ChatGPT and the authors agreed on the same
articles. However, it is worth noting that ChatGPT’s responses were completely different
for unrelated articles, and it stopped responding to questions (Please refer to Figure S9).
This substantial level of agreement underscores the efficacy of ChatGPT in effectively
comprehending and extracting information from the articles. Upon evaluating the effi-
cacy of this approach in filtering the initial set of 145 articles, we successfully identified
56 articles as irrelevant, enabling us to focus on extracting pertinent information from the
remaining 86 articles. This demonstrates the valuable role of ChatGPT in streamlining the
article filtration process and automating information extraction from a substantial number
of articles.
Since the snowballing process is an integral part of conducting an SR, we employed
both backward and forward snowballing techniques to uncover additional relevant studies
that might have been overlooked during the initial database search [24]. The backward
snowballing method involves scrutinizing the references of the included papers to identify
related articles, while the forward snowballing technique entails searching for studies
among the articles that cited the included ones [24]. We manually conducted the snow-
balling process in this study by screening the titles of articles. However, we recognize
the potential of leveraging ChatGPT to automate this step in order to advance the full
automation of the SR process. By implementing the snowballing strategy, we successfully
identified 52 new articles through multiple iterations in addition to the articles previously
identified. These 52 articles underwent the same comprehensive filtration method outlined
earlier in our methodology. As a result, 19 articles were excluded due to their lack of rele-
vance, while the remaining 33 articles met the criteria for inclusion in our review database.
Consequently, the total number of relevant articles included in our review increased to 119.
Overall, leveraging ChatGPT ensures a more thorough filtering process, assists in
extracting information based on responses to comprehensive questions, and enables the
inclusion of snowballing articles, expanding our review’s breadth and scope. By capital-
izing on ChatGPT’s capabilities, we enhance the SR methodology’s efficiency, accuracy,
and reliability.

3.4. Analysis and Interpretation of Extracted Information


This phase focuses on analyzing the content collected in the previous phases, explicitly
emphasizing the sub-categories outlined in Figure 12. The flowchart for phase 3 is illus-
trated in Figure 16, providing a visual representation of the analysis process. In order to
streamline the analysis process, the “Yes” responses to each question were initially com-
piled and organized. Subsequently, these compiled responses were further analyzed and
presented in Table S8. This approach facilitates a cohesive and structured data exploration,
allowing for a more rigorous examination of the insights obtained from ChatGPT.
Accordingly, Table 4 provides a comprehensive overview of the response statistics
obtained during Phase 2 and the corresponding objectives for analyzing each question.
These responses served as prompts for ChatGPT, with a maximum of ten responses per
prompt, covering all sub-categories outlined in Figure 12. The content analysis encom-
passed information extraction related to sensors and sensing technologies, data acquisition
and transmission, data analytics and visualization, and applications and case studies, as
well as limitations and gaps identified in the reviewed articles. Leveraging ChatGPT as
an analytical tool facilitated a more thorough identification of various patterns and trends
within the data analysis process. For example, a specific prompt was designed to explore
the utilization of multiple types of sensors and their associated benefits, as depicted in
Figure 17.
Systems 2023, 11, 351 26 of 38
Systems 2023, 11, 351 26 of 37

Figure 16. Flow chart of Phase 3.

Accordingly, Table 4 provides a comprehensive overview of the response statistics


obtained during Phase 2 and the corresponding objectives for analyzing each question.
These responses served as prompts for ChatGPT, with a maximum of ten responses per
prompt, covering all sub-categories outlined in Figure 12. The content analysis encom-
passed information extraction related to sensors and sensing technologies, data acquisi-
tion and transmission, data analytics and visualization, and applications and case studies,
as well as limitations and gaps identified in the reviewed articles. Leveraging ChatGPT as
an analytical tool facilitated a more thorough identification of various patterns and trends
within the data analysis process. For example, a specific prompt was designed to explore
the utilization of multiple types of sensors and their associated benefits, as depicted in
Figure 17.
16.Flow
Figure 16.
Figure Flowchart
chartofof
Phase 3. 3.
Phase

Accordingly, Table 4 provides a comprehensive overview of the response statistics


obtained during Phase 2 and the corresponding objectives for analyzing each question.
These responses served as prompts for ChatGPT, with a maximum of ten responses per
prompt, covering all sub-categories outlined in Figure 12. The content analysis encom-
passed information extraction related to sensors and sensing technologies, data acquisi-
tion and transmission, data analytics and visualization, and applications and case studies,
as well as limitations and gaps identified in the reviewed articles. Leveraging ChatGPT as
an analytical tool facilitated a more thorough identification of various patterns and trends
within the data analysis process. For example, a specific prompt was designed to explore
the utilization of multiple types of sensors and their associated benefits, as depicted in
Figure 17.

Figure 17. User prompt and ChatGPT answer for the use of different types of sensors.
Figure 17. User prompt and ChatGPT answer for the use of different types of sensors.

Similarly, trends in data transfer technologies were examined based on the responses
to question 2-1 (Figure 12). Figure 18 illustrates ChatGPT’s responses concerning the spe-
cific applications of wireless communication technologies. Furthermore, multiple prompts
were devised within the data analysis and the visualization section. These prompts aided
in exploring diverse approaches employed for data analysis, including AI and ML tech-
niques, as well as visualization methods utilized for decision-making processes (Figure S10).
Additionally, questions 4-1 and 4-2 were integral to the review process, assessing the im-
plementation of proposed systems or case studies in the studied papers while identifying
prevailing trends and scopes (Figure 19). The benefits associated with such implementations
were also investigated within each article (Figure S11).
Systems 2023, 11, 351 27 of 37

Table 4. The gathered responses (yes) for each of the three major categories.

Water Water Wastewater


Sub-
Question Quality Infrastructure Infrastructure Objectives
Category
Monitoring Management Management.
Answers (YES)
1-1: Sensor Identify trends in sensor development
26 28 36
development. and manufacturing, study the
development

advantages of employing several


Sensors’

1-2: Use of different sensors, investigate the frequency of


18 37 19
types of sensors. sensor use, categorize sensors
1-3: sensors according to their functionality, and
performance 21 15 15 investigate the methods used to
evaluation. evaluate sensor performance.
2-1: Data collection Identify trends and anomalies in
and transmission 33 45 38 transmission methods, including the
Data transmission

method. utilization of wireless communications,


the types of wireless technologies
2-2: Use of wireless
31 38 31 employed, and the frequency of their
communication.
occurrence in the examined papers.
2-3: Connectivity Analyze, also, the effectiveness of
performance 7 5 5 utilizing various communication
evaluation. technologies.
3-1: Data analysis
20 33 14
methods.
Data analysis

Define frequently applied data analysis


3-2: Use of ML techniques, including AI and ML
6 11 0
algorithms. techniques, and study the trends in
3-3: Data visualization visualization approaches.
to facilitate 12 19 17
decision-making.
4-1: The use in Identify trends in the implementation of
28 39 28
studies

real-world settings.
Case

IoT-based systems in various real-world


4-2: Benefits and contexts and the outcomes and
17 37 27 advantages of these implementations.
outcomes.
5-1: limitations and
gaps in current 14 24 15 Define the limitations and gaps
limitations

research.
and gaps

identified by the authors, the obstacles


5-2: Implementation encountered in implementing their
25 42 23 systems, the offered solutions, and the
challenges.
recommendations for overcoming them.
5-3: Recommendations
20 37 16
or solutions.

The analysis stage also involved thoroughly examining the limitations and research
gaps discussed in previous studies, along with the corresponding recommendations put
forth by researchers. Leveraging ChatGPT in this phase facilitated a comprehensive ex-
ploration and in-depth understanding of the challenges and limitations encountered in
prior research and the proposed solutions adopted to address them. To ensure a systematic
approach to identifying and categorizing the limitations and challenges discussed by differ-
ent authors, a carefully designed prompt (Figure 20) was employed, utilizing the results
obtained from questions 5-1 and 5-2 in Figure 12.
Systems 2023, 11, 351 28 of 38
Systems2023,
Systems 11,351
2023,11, 351 2828ofof38
37

Figure 18. User prompt and ChatGPT answer for questions related to wireless communication tech-
Figure 18. 18.
Figure UserUser
prompt and and
prompt ChatGPT answer
ChatGPT for questions
answer related
for questions to wireless
related communication
to wireless communica- tech-
nologies.
nologies.
tion technologies.

Figure 19. User prompt and ChatGPT answer for the trends within the proposed systems or case
studies.

Figure 19.TheUser prompt


analysis stageand ChatGPT
also involvedanswer for theexamining
thoroughly trends within the proposed
the limitations andsystems
researchor case
Figure 19. User prompt and ChatGPT answer for the trends within the proposed systems or
studies.
gaps discussed in previous studies, along with the corresponding recommendations put
case studies.
forth by researchers. Leveraging ChatGPT in this phase facilitated a comprehensive ex-
The
ploration analysis
This and
approachstage
in-depth also involved
understanding
allowed thoroughly
of the
for extracting examining
andchallenges
organizing and the insights
limitations
limitations
valuable fromandthe research
encountered in
col-
prior
gaps research
discussed
lected and the proposed
in previous
data. Additionally, solutions
studies, adopted
alonglist
a comprehensive with to corresponding
the address them.
of recommendations Torecommendations
was ensure a drawing
compiled, system- put
atic
from approach
the to
proposedidentifying
solutions and categorizing
identified in the
question limitations
5–3 and and challenges
categorized
forth by researchers. Leveraging ChatGPT in this phase facilitated a comprehensive discussed
based on common by ex-
different
trends authors,
(Figure 21).a carefully designed prompt (Figure 20) was employed,
ploration and in-depth understanding of the challenges and limitations encountered in utilizing the
results obtained
prior This
research andfrom
approach the questions
yielded
proposed 5–1 and
a wealth 5–2 in Figureregarding
of information
solutions adopted to 12. thethem.
address challenges, limitations,
To ensure a system-
and potential solutions found in the reviewed articles. In order to gain a deeper under-
atic approach to identifying and categorizing the limitations and challenges discussed by
standing and assess the extent of the resolved issues, a ChatGPT prompt was utilized
different authors, a carefully designed prompt (Figure 20) was employed, utilizing the
results obtained from questions 5–1 and 5–2 in Figure 12.
Systems2023,
Systems 11,351
2023,11, 351 2929ofof38
37

to compare the limitations and the challenges highlighted by various authors with the
suggested solutions and recommendations. This comparative analysis provided valuable
Systems 2023, 11, 351 29 of 38
insights into the existing research gaps and identified areas for further investigation and
research. An example depicting the resulting research gaps is illustrated in Figure 22.

Figure 20. User prompt and ChatGPT answer to identify and categorize the limitations and chal-
lenges discussed by previous authors.

This approach allowed for extracting and organizing valuable insights from the col-
lected data. Additionally, a comprehensive list of recommendations was compiled, draw-
Figure
ing 20. the
from Userproposed
prompt and ChatGPT
solutions answer to identify
identified and categorize the limitations and chal-
Figure 20. User prompt and ChatGPT answer toin question
identify 5–3 and categorized
and categorize based
the limitations on com-
and challenges
lenges discussed by previous authors.
mon trends
discussed by(Figure
previous21).
authors.
This approach allowed for extracting and organizing valuable insights from the col-
lected data. Additionally, a comprehensive list of recommendations was compiled, draw-
ing from the proposed solutions identified in question 5–3 and categorized based on com-
mon trends (Figure 21).

Figure 21. User prompt and ChatGPT answer to generate a comprehensive list of the compiled rec-
Figure 21. User prompt and ChatGPT answer to generate a comprehensive list of the com-
ommendations.
piled recommendations.

Figure 21. User prompt and ChatGPT answer to generate a comprehensive list of the compiled rec-
ommendations.
This approach yielded a wealth of information regarding the challenges, limitations,
and potential solutions found in the reviewed articles. In order to gain a deeper under-
standing and assess the extent of the resolved issues, a ChatGPT prompt was utilized to
compare the limitations and the challenges highlighted by various authors with the sug-
Systems 2023, 11, 351 gested solutions and recommendations. This comparative analysis provided valuable 30 ofin-
37
sights into the existing research gaps and identified areas for further investigation and
research. An example depicting the resulting research gaps is illustrated in Figure 22.

Figure 22. User prompt and ChatGPT answer for comparing the limitations and challenges high-
Figure 22.
lighted User prompt
by various and
authors ChatGPT
with answersolutions
the suggested for comparing the limitations and challenges high-
and recommendations.
lighted by various authors with the suggested solutions and recommendations.
4.
4. ChatGPT
ChatGPT Strengths,
Strengths, Limitations,
Limitations, and
and Future
Future Directions
Directions in in Automating
Automating SR SR Process
Process
ChatGPT,
ChatGPT, built on the GPT-3.5 architecture, represents a significant breakthrough in
built on the GPT-3.5 architecture, represents a significant breakthrough in
AI
AI research, enabling
research, enabling the
the generation
generation of of coherent
coherent and
and meaningful
meaningful human-like
human-like language
language by
by
leveraging
leveraging vast
vast amounts
amounts of of language
language data.
data. This
This innovative
innovative language
language model
model holds
holds promise
promise
for
for various domains, including systematic reviews, and can potentially contribute to
various domains, including systematic reviews, and can potentially contribute to the
the
advancement of general artificial intelligence. However, it is important
advancement of general artificial intelligence. However, it is important to acknowledgeto acknowledge
that,
that, being
being aa generative
generative model,
model, ChatGPT
ChatGPT cannot
cannot guarantee
guarantee thethe absolute
absolute accuracy
accuracy of
of its
its
outputs.
outputs. Therefore,
Therefore,thisthissection
sectionwill explore
will thethe
explore strengths, limitations,
strengths, potential
limitations, areas
potential for
areas
enhancement,
for enhancement, and and
future research
future directions
research concerning
directions ChatGPT
concerning in theincontext
ChatGPT of con-
the context of
ducting SRs.
conducting SRs.

Strengths of
4.1. Strengths of ChatGPT
ChatGPT in SR Process
ChatGPT has
ChatGPT has been
been proven
proven toto be a valuable tool in the SR process, offering several
strengths that enhance the efficiency and effectiveness
strengths that enhance the efficiency and effectiveness of
of the
the methodology. Through
Through our
our
methodology and evaluation, we have identified the following key strengths of ChatGPT
methodology and evaluation, we have identified the following key strengths of ChatGPT
in conducting SRs:
1. FullAutomation:
Full Automation:ChatGPT
ChatGPTcontributes
contributesto
toautomating
automatingseveral
severaltasks
tasksin
inthe
theSR
SRprocess,
process,
suchas
such asgenerating
generatingresearch
researchquestions,
questions, suggesting
suggesting BRTs,
BRTs, categorizing
categorizing thethe relevant
relevant ar-
articles,
ticles, discarding
discarding unrelated
unrelated ones,proposing
ones, proposingsub-categories
sub-categoriestotobebecovered
covered for
for each
each
main category,
main category, generating
generatingresearch
researchquestions
questionstoto
aid in in
aid information extraction
information from
extraction the
from
articles, and extracting all relevant information. This level of automation facilitated
by ChatGPT helps streamline the SR process and decrease the time and errors.
2. Enhanced accuracy and efficiency: ChatGPT offers a valuable advantage by improving
the accuracy and efficiency of filtering and classifying articles. Researchers can
benefit from its ability to swiftly identify relevant studies, reducing uncertainty,
and saving significant time and effort. Moreover, ChatGPT’s proficiency in natural
language processing aids in precise content analysis, minimizing the risk of errors,
and omissions in research interpretation.
Systems 2023, 11, 351 31 of 37

3. Time-saving: ChatGPT demonstrates significant potential in saving time during SRs,


which are known to be time-consuming and resource-intensive processes that require
high levels of expertise and attention to detail. ChatGPT assists in this process by
swiftly analyzing and summarizing large volumes of the literature, aiding researchers
in identifying relevant studies and extracting key information more efficiently. In
our study, ChatGPT played a significant role in tasks such as filtering, categorizing,
and content analysis, which resulted in decreased time and effort as well as reduced
sources of uncertainty. However, it is important to note that human experts should
carefully review ChatGPT-generated summaries.
4. Improved reproducibility: While ChatGPT’s responses were found to be influenced by
the user prompts, the same procedure can be replicated multiple times by following
the same guidelines and adhering to the recommended approach. This enhances the
reproducibility of the results, allowing for consistent outcomes to be obtained through
repeated application of the methodology. ChatGPT’s responses are markedly affected
by the user prompts, and the same procedure can be reproduced several times by
conducting the same procedures and following the recommendations.
5. Flexibility: The method introduced utilizing ChatGPT for automating the SR process
can be applied for conducting SRs across various fields. This flexibility allows for the
potential utilization of ChatGPT in various research domains, providing opportunities
for its application beyond the specific context of the current study.

4.2. Limitations of ChatGPT in SR Process


ChatGPT, despite its strengths, also has certain limitations that need to be considered
when applying it to the SR methodology. These limitations arise from the nature of the
model and the challenges associated with its implementation in complex research tasks.
Understanding these limitations and constraints is considered crucial to ensuring the
appropriate use and interpretation of ChatGPT-generated outputs in the SR process. This
subsection discusses the limitations of ChatGPT in the context of SR methodology and
identifies improvement opportunities. Our study has uncovered the following limitations:
1. Limited ability to extract full-text articles: Despite ChatGPT’s capability to suggest
and adjust BSTs based on user requests, it is not optimized for article extraction, which
may impact the comprehensiveness of the SR. As a result, ChatGPT’s limitations in
extracting articles may constrain the SR process’s completeness.
2. Limited ability to extract all information from articles: Despite ChatGPT’s capability
to filter, categorize articles, and extract text information, it may encounter limitations
in extracting all relevant information, especially if the information is presented in
non-standard formats such as figures or other non-textual forms. This may result
in incomplete extraction of relevant data, particularly from articles that utilize non-
traditional data presentation methods, potentially impacting the comprehensiveness
and accuracy of the extracted information during the SR process.
3. Dependence on input data: ChatGPT’s performance highly depends on the input data
quality. If the data is biased or incomplete, GPT’s output may be similarly flawed.
4. Limited Access to Real-Time Data: One notable drawback of ChatGPT in its appli-
cation to automating the SR process pertains to its dependence on a pre-existing
database. ChatGPT relies solely on the information it was trained on, lacking access
to real-time data from the internet. Consequently, the model’s knowledge and com-
prehension are confined to the training data, limiting its ability to incorporate the
latest research studies, publications, and emerging evidence. This limitation poses
challenges in providing comprehensive and up-to-date information throughout the
systematic review process.
5. Length of prompts: While ChatGPT has the ability to generate high-quality responses,
the length and complexity of the prompts used can impact the accuracy and coherence
of the generated text. Our study revealed that longer prompts tended to result in
more accurate and relevant responses, but also required more time and effort to
Systems 2023, 11, 351 32 of 37

prepare. Conversely, shorter prompts were easier and quicker to generate, but may
have led to less accurate or incomplete responses. Hence, balancing the prompt’s
length and complexity with the generated text’s accuracy and relevance is important.
Additionally, careful consideration should be given to the prompt formulation process
to ensure that the generated responses meet the desired quality standards in the
context of the SR process.
6. Token limitations: ChatGPT limits the number of tokens that can be processed simulta-
neously. This means that the length of the input sequence (i.e., prompt plus generated
text) is limited and may require multiple iterations or segmentation to generate longer
responses. Our study encountered this limitation when attempting to generate longer
responses. This limitation can affect the efficiency and effectiveness of the ChatGPT’s
model for certain tasks, especially in Phase 2, where the filtration occurred by feeding
the ChatGPT with some parts from the article.
7. Memory limitations: The ChatGPT ‘s ability to recall previous prompts and maintain
a coherent and accurate discourse on a specific topic is a crucial consideration, as it
can impose constraints that impact its scalability and applicability to certain tasks.
Within our study, we encountered restrictions related to memory capacity, wherein
ChatGPT occasionally struggled to provide responses that remained focused on the
precise topic, leading to deviations or inaccuracies in its understanding of our prompts.
This was particularly noticeable when working with large datasets or engaging in
multiple iterations, highlighting the potential impact of memory limitations on the
model’s performance.

4.3. Future Perspectives: Expanding the Potential of ChatGPT in SR


As technology advances and AI-driven language models such as ChatGPT become
more sophisticated, there are exciting opportunities for further development and utilization
in the field of SR. The future perspectives of ChatGPT in SR offer potential avenues for
enhancing the review process’s efficiency, accuracy, and comprehensiveness. By addressing
existing challenges and building upon the strengths of ChatGPT, researchers can unlock its
full potential in advancing evidence synthesis and knowledge discovery. This subsection
explores some of the future perspectives and areas of improvement for ChatGPT in the SR
methodology, including:
1. Conducting the snowballing procedure using ChatGPT: This approach involves utiliz-
ing ChatGPT to search the database using BSTs, applying the first round of filtering
based on abstracts, and then collecting remaining articles along with their references
(backward) and cited publications (forward). These collected articles would undergo
another round of abstract screening before proceeding to the second level of filtering.
Automating the snowballing procedure with ChatGPT could streamline the filtration
process, making it more efficient and time-saving for researchers.
2. Developing more sophisticated algorithms to extract information from articles: Ad-
vanced techniques such as entity recognition and topic modeling could be employed
to enhance the accuracy and precision of information extraction from articles. These
techniques can enable ChatGPT to identify and extract relevant information more
effectively, particularly from non-standard formats such as tables, figures, and other
complex structures commonly found in scholarly literature.
3. Improving the interpretability of ChatGPT’s output: Efforts could be made to develop
tools or techniques to visualize and comprehend ChatGPT’s output. This may involve
creating visual representations or graphical displays that aid in understanding the
generated summaries or recommendations. Additionally, developing more trans-
parent algorithms, which are easier for researchers to comprehend, can improve the
interpretability of ChatGPT’s output.
4. Expanding the scope of input data for ChatGPT: One potential avenue for enhancing
the performance of ChatGPT in conducting SRs could be to explore the model’s
applicability on data from fields with more relevant articles. This could involve
Systems 2023, 11, 351 33 of 37

testing the content analysis capabilities of ChatGPT by inputting a large amount of


data and examining the conclusions drawn by the model. Additionally, employing
ChatGPT on data from new fields can serve as a valuable means to test the robustness
and integrity of the developed methodology in response to different aspects.
5. Access to Real-Time Data: The SR process using ChatGPT can benefit from several
avenues for improvement. Firstly, ChatGPT can provide accurate, current information
regarding articles based on real-time access to databases, such as Scopus and Web
of Science. In addition, internet connectivity enhances data retrieval and screen-
ing capabilities by allowing users to access a broader range of sources. Secondly,
dynamic search strategies enable real-time feedback to be integrated into iterative
enhancements. Thirdly, automated citation management and reference management,
integration of collaborative platforms, and access to diverse perspectives and global
research materials enhance the SR process. However, the success of these enhance-
ments critically hinges on the particular implementation, ethical considerations, and
rigorous validation of retrieved information.
Overall, it is essential to embrace the development of AI and use it with caution and
supervision in critical domains. While ChatGPT offers significant potential in automat-
ing SR processes, it is essential to acknowledge and address its limitations. Strategies
for enhancing ChatGPT’s performance in conducting SRs should be carefully devised
and implemented.

5. Ethical Considerations in Utilizing AI-Language Models


The utilization of AI language models such as ChatGPT in scientific writing necessi-
tates careful attention to ethical considerations. Integrating these models raises important
questions that require thorough examination and appropriate safeguards. One crucial
ethical consideration in utilizing AI language models is the validation, verification, and
critical evaluation of AI-generated outputs in order to ensure their accuracy, reliability,
and appropriate contextualization within the broader scientific knowledge. In this regard,
the involvement of human experts is paramount. Their supervision and expertise play a
critical role in aligning the outputs with established standards, identifying and rectifying
potential inaccuracies or biases, and providing a comprehensive and accurate interpretation
of the AI-generated content. By incorporating human judgment and critical evaluation,
researchers uphold responsible practices that enhance the reliability and credibility of the
findings derived from AI language models.
Ethical considerations also encompass aspects such as data privacy, informed consent,
and bias mitigation strategies. Researchers must adhere to established guidelines and
regulations to protect data privacy when utilizing AI language models. This involves han-
dling sensitive or personal information with utmost care and ensuring strict confidentiality
to comply with privacy standards. Obtaining informed consent becomes crucial when
utilizing data collected from individuals or sources with sensitive information. Moreover,
researchers must proactively implement strategies to identify and mitigate biases that may
arise from the input data used in the automated SR process, ensuring fair and unbiased out-
comes. By conscientiously addressing these ethical considerations, researchers contribute
to the cultivation of a responsible and ethical environment for the utilization of AI language
models in scientific writing.

6. Concluded Remarks and Recommendations


Our study presents a novel methodology for conducting systematic reviews by lever-
aging the power of ChatGPT. By combining the strengths of human expertise and AI
capabilities, we aimed to streamline the traditional SR process and improve its efficiency
and accuracy. Our study applied this method to conduct a comprehensive SR on IoT appli-
cations in water and wastewater infrastructure management, and our findings highlight
the benefits of using ChatGPT in each step of the process. Our study revealed that ChatGPT
effectively generates research questions and suggests Boolean research terms, but not appro-
Systems 2023, 11, 351 34 of 37

priate for article extraction. However, it performs excellently in filtering and categorizing
articles and excellently in full-text filtration and information extraction after preparing
prompts. Our comprehensive content analysis of the selected publications revealed valu-
able insights into the current research landscape, highlighting emerging trends, identifying
research gaps, and shedding light on future directions in the domains of IoT-based sensing
and monitoring, data analytics and visualization, as well as applications and case studies.
We evaluated our methodology using quantitative comparisons with traditional review
techniques and expert opinions, and the results show that our approach significantly saves
time and effort while maintaining high levels of accuracy. Our findings demonstrate the
potential of ChatGPT in improving the efficiency and accuracy of SRs, contributing to the
advancement of scientific knowledge. In conclusion, there are promising avenues for future
research in fully exploring the capabilities of ChatGPT in SRs, investigating its limitations in
diverse research contexts, and applying our approach to other fields to further enhance the
efficiency and accuracy of SRs. We strongly recommend adopting our proposed framework
as a reliable guide for conducting SRs in diverse domains. Our proposed framework,
as depicted in Figure 23, provides a robust foundation for automating the SR process,
offering adaptability and scalability to accommodate research complexities. By recognizing
Systems 2023, 11, 351 the strengths and limitations of ChatGPT and taking appropriate measures to enhance 35 of 38
its performance, researchers can maximize the benefits of AI in evidence synthesis while
ensuring the precision and integrity of SRs in the scientific community.

Figure 23. Automated Framework for Streamlining SR Methodology: A Proposed Approach.


Figure 23. Automated Framework for Streamlining SR Methodology: A Proposed Approach.

Supplementary Materials: The following supporting information can be downloaded at:


https://www.mdpi.com/article/10.3390/systems11070351/s1, Figure S1: Initialization Process. (a-e)
Introducing Iot Technology.; Figure S2: Initialization Process. (a-d) Introducing Civil Engineering
Infrastructure.; Figure S3: Initialization Process. (a-d) Introducing Water and Wastewater Infrastruc-
ture.; Figure S4: Initialization Process. (a-d) Implementing IoT In Water and Wastewater Infrastruc-
ture.; Figure S5: Initialization Process. (a-d) Investigating the Systematic Review Capability.; Figure
Systems 2023, 11, 351 35 of 37

Supplementary Materials: The following supporting information can be downloaded at: https://
www.mdpi.com/article/10.3390/systems11070351/s1, Figure S1: Initialization Process. (a–e) In-
troducing IoT Technology; Figure S2: Initialization Process. (a–d) Introducing Civil Engineering
Infrastructure; Figure S3: Initialization Process. (a–d) Introducing Water and Wastewater Infras-
tructure; Figure S4: Initialization Process. (a–d) Implementing IoT In Water and Wastewater Infras-
tructure; Figure S5: Initialization Process. (a–d) Investigating the Systematic Review Capability;
Figure S6: ChatGPT’s Utilization of BSTs. (a–e) Extracting Search Keywords; Figure S7: Exam-
ples of references from ChatGPT. (a) Extracting related paper based on the Boolean search term.
(b) Example of one of the incorrect references. Figure S8: A section of the questionnaire created
using Google Forms; Figure S9: Two examples of ChatGPT’s responses in case of irrelevant articles;
Figure S10: User prompt and ChatGPT answer to the methods used for data analysis and visual-
ization; Figure S11: User prompt and ChatGPT answer for the benefits of implementing the case
studies. Table S1: Unique keywords as extracted from ChatGPT and VosViewer; Table S2: Comparison
between ChatGPT and human experts in classification process for Selected 120 articles; Table S3: Cat-
egorization of all articles using ChatGPT (APA+Abstract); Table S4: Articles belong to IoT-based
water quality monitoring as classified using ChatGPT with explanation; Table S5: Articles belong
to IoT-based wastewater infrastructure management as classified using ChatGPT with explanation;
Table S6: Articles belong to IoT-based water infrastructure management as classified using ChatGPT
with explanation; Table S7: Comparison between answers form ChatGPT and human experts for the
14 questions related to the five subcategorizes for selected 30 articles; Table S8: ChatGPT responses
to the 14 questions with Yes/No and the detailed description for the answers. (a) IoT-based water
infrastructure management, (b) IoT-based wastewater infrastructure management, and (c) IoT-based
water quality monitoring.
Author Contributions: Conceptualization, A.A., E.A. and M.E.; methodology, A.A., E.A. and M.E.;
validation, A.A., E.A. and M.E.; formal analysis, A.A., E.A. and M.E.; investigation, E.A. and A.E.E.E.;
writing—original draft preparation, A.A., E.A. and M.E.; writing—review and editing, E.A., A.E.E.E.
and A.A.; visualization, M.E., E.A. and A.A.; supervision, E.A., A.E.E.E. and T.Z.; project administra-
tion, A.E.E.E. and T.Z.; funding acquisition, A.E.E.E. and T.Z. All authors have read and agreed to the
published version of the manuscript.
Funding: This work was supported by the University Grant Committee of Hong Kong Polytechnic
University: [Grant Number Project No. P0036181].
Data Availability Statement: Not applicable.
Acknowledgments: The Author would like to thank greatly the volunteers who participated in the
filtering process.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Paré, G.; Trudel, M.-C.; Jaana, M.; Kitsiou, S. Synthesizing Information Systems Knowledge: A Typology of Literature Reviews.
Inf. Manag. 2015, 52, 183–199. [CrossRef]
2. Yuan, Y.; Hunt, R.H. Systematic Reviews: The Good, the Bad and the Ugly. Am. J. Gastroenterol. 2009, 104, 1086–1092. [CrossRef]
[PubMed]
3. Kitchenham, B. Procedures for Performing Systematic Reviews; Keele University: Keele, UK, 2004.
4. Mulrow, C.D. Systematic Reviews: Rationale for Systematic Reviews. BMJ 1994, 309, 597–599. [CrossRef] [PubMed]
5. Needleman, I.G. A Guide to Systematic Reviews. J. Clin. Periodontol. 2002, 29, 6–9. [CrossRef]
6. Agbo, C.; Mahmoud, Q.; Eklund, J. Blockchain Technology in Healthcare: A Systematic Review. Healthcare 2019, 7, 56. [CrossRef]
7. FitzGerald, C.; Hurst, S. Implicit Bias in Healthcare Professionals: A Systematic Review. BMC Med. Ethics 2017, 18, 19. [CrossRef]
8. Milne-Ives, M.; de Cock, C.; Lim, E.; Shehadeh, M.H.; de Pennington, N.; Mole, G.; Normando, E.; Meinert, E. The Effectiveness of
Artificial Intelligence Conversational Agents in Health Care: Systematic Review. J. Med. Internet Res. 2020, 22, e20346. [CrossRef]
9. Abu-Odah, H.; Su, J.; Wang, M.; Lin, S.-Y.; Bayuo, J.; Musa, S.S.; Molassiotis, A. Palliative Care Landscape in the COVID-19 Era:
Bibliometric Analysis of Global Research. Healthcare 2022, 10, 1344. [CrossRef]
10. Aarseth, W.; Ahola, T.; Aaltonen, K.; Økland, A.; Andersen, B. Project Sustainability Strategies: A Systematic Literature Review.
Int. J. Proj. Manag. 2017, 35, 1071–1083. [CrossRef]
11. Shaban, I.A.; Eltoukhy, A.E.E.; Zayed, T. Systematic and Scientometric Analyses of Predictors for Modelling Water Pipes
Deterioration. Autom. Constr. 2023, 149, 104710. [CrossRef]
12. Silva, M. A Systematic Review of Foresight in Project Management Literature. Procedia Comput. Sci. 2015, 64, 792–799. [CrossRef]
Systems 2023, 11, 351 36 of 37

13. Karam, A.; Eltoukhy, A.E.E.; Shaban, I.A.; Attia, E.-A. A Review of COVID-19-Related Literature on Freight Transport: Impacts,
Mitigation Strategies, Recovery Measures, and Future Research Directions. Int. J. Environ. Res. Public Health 2022, 19, 12287.
[CrossRef] [PubMed]
14. Araújo, A.G.; Pereira Carneiro, A.M.; Palha, R.P. Sustainable Construction Management: A Systematic Review of the Literature
with Meta-Analysis. J. Clean. Prod. 2020, 256, 120350. [CrossRef]
15. Hussein, M.; Eltoukhy, A.E.E.; Karam, A.; Shaban, I.A.; Zayed, T. Modelling in Off-Site Construction Supply Chain Management:
A Review and Future Directions for Sustainable Modular Integrated Construction. J. Clean. Prod. 2021, 310, 127503. [CrossRef]
16. Taiwo, R.; Shaban, I.A.; Zayed, T. Development of Sustainable Water Infrastructure: A Proper Understanding of Water Pipe
Failure. J. Clean. Prod. 2023, 398, 136653. [CrossRef]
17. Michalski, A.; Głodziński, E.; Böde, K. Lean Construction Management Techniques and BIM Technology—Systematic Literature
Review. Procedia Comput. Sci. 2022, 196, 1036–1043. [CrossRef]
18. Abdelkader, E.M.; Zayed, T.; Faris, N. Synthesized Evaluation of Reinforced Concrete Bridge Defects, Their Non-Destructive
Inspection and Analysis Methods: A Systematic Review and Bibliometric Analysis of the Past Three Decades. Buildings
2023, 13, 800. [CrossRef]
19. Elshaboury, N.; Al-Sakkaf, A.; Mohammed Abdelkader, E.; Alfalah, G. Construction and Demolition Waste Management Research:
A Science Mapping Analysis. Int. J. Environ. Res. Public Health 2022, 19, 4496. [CrossRef]
20. Eltoukhy, A.E.E.; Chan, F.T.S.; Chung, S.H. Airline Schedule Planning: A Review and Future Directions. Ind. Manag. Data Syst.
2017, 117, 1201–1243. [CrossRef]
21. Hassan, L.K.; Santos, B.F.; Vink, J. Airline Disruption Management: A Literature Review and Practical Challenges. Comput. Oper.
Res. 2021, 127, 105137. [CrossRef]
22. Aromataris, E.; Riitano, D. Systematic Reviews. AJN Am. J. Nurs. 2014, 114, 49–56. [CrossRef] [PubMed]
23. Meline, T. Selecting Studies for Systemic Review: Inclusion and Exclusion Criteria. Contemp. Issues Commun. Sci. Disord. 2006, 33,
21–27. [CrossRef]
24. Wohlin, C. Guidelines for Snowballing in Systematic Literature Studies and a Replication in Software Engineering. In Proceedings
of the 18th International Conference on Evaluation and Assessment in Software Engineering, London, UK, 13–14 May 2014; ACM:
New York, NY, USA, 2014; pp. 1–10.
25. Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The
PRISMA Statement. Int. J. Surg. 2010, 8, 336–341. [CrossRef] [PubMed]
26. Sarkis-Onofre, R.; Catalá-López, F.; Aromataris, E.; Lockwood, C. How to Properly Use the PRISMA Statement. Syst. Rev.
2021, 10, 117. [CrossRef]
27. Aydın, Ö.; Karaarslan, E. OpenAI ChatGPT Generated Literature Review: Digital Twin in Healthcare. SSRN Electron. J. 2022.
[CrossRef]
28. Cascella, M.; Montomoli, J.; Bellini, V.; Bignami, E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple
Clinical and Research Scenarios. J. Med. Syst. 2023, 47, 33. [CrossRef]
29. Vaishya, R.; Misra, A.; Vaish, A. ChatGPT: Is This Version Good for Healthcare and Research? Diabetes Metab. Syndr. Clin. Res.
Rev. 2023, 17, 102744. [CrossRef]
30. Halaweh, M. ChatGPT in Education: Strategies for Responsible Implementation. Contemp. Educ. Technol. 2023, 15, ep421.
[CrossRef]
31. Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.;
Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language
Models. PLOS Digit. Health 2023, 2, e0000198. [CrossRef]
32. Zhai, X. ChatGPT for Next Generation Science Learning. XRDS Crossroads ACM Mag. Stud. 2023, 29, 42–46. [CrossRef]
33. Rudolph, J.; Tan, S.; Tan, S. ChatGPT: Bullshit Spewer or the End of Traditional Assessments in Higher Education? J. Appl. Learn.
Teach. 2023, 6, 342–362. [CrossRef]
34. Prieto, S.A.; Mengiste, E.T.; García de Soto, B. Investigating the Use of ChatGPT for the Scheduling of Construction Projects.
Buildings 2023, 13, 857. [CrossRef]
35. You, H.; Ye, Y.; Zhou, T.; Zhu, Q.; Du, J. Robot-Enabled Construction Assembly with Automated Sequence Planning Based on
ChatGPT: RoboGPT. arXiv 2023, arXiv:2304.11018.
36. Alkaissi, H.; McFarlane, S.I. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 2023, 15, e35179.
[CrossRef] [PubMed]
37. Salvagno, M.; Taccone, F.S.; Gerli, A.G. Can Artificial Intelligence Help for Scientific Writing? Crit. Care 2023, 27, 75. [CrossRef]
[PubMed]
38. Zheng, H.; Zhan, H. ChatGPT in Scientific Writing: A Cautionary Tale. Am. J. Med. 2023. [CrossRef]
39. Dergaa, I.; Chamari, K.; Zmijewski, P.; Ben Saad, H. From Human Writing to Artificial Intelligence Generated Text: Examining the
Prospects and Potential Threats of ChatGPT in Academic Writing. Biol. Sport 2023, 40, 615–622. [CrossRef]
40. Khosravi, H.; Shafie, M.R.; Hajiabadi, M.; Raihan, A.S.; Ahmed, I. Chatbots and ChatGPT: A Bibliometric Analysis and Systematic
Review of Publications in Web of Science and Scopus Databases. arXiv 2023, arXiv:2304.05436.
41. Lecler, A.; Duron, L.; Soyer, P. Revolutionizing Radiology with GPT-Based Models: Current Applications, Future Possibilities and
Limitations of ChatGPT. Diagn. Interv. Imaging 2023, 104, 269–274. [CrossRef]
Systems 2023, 11, 351 37 of 37

42. Hosseini, M.; Horbach, S.P.J.M. Fighting Reviewer Fatigue or Amplifying Bias? Considerations and Recommendations for Use of
ChatGPT and Other Large Language Models in Scholarly Peer Review. Res. Integr. Peer. Rev. 2023, 8, 4. [CrossRef]
43. Fang, T.; Yang, S.; Lan, K.; Wong, D.F.; Hu, J.; Chao, L.S.; Zhang, Y. Is ChatGPT a Highly Fluent Grammatical Error Correction
System? A Comprehensive Evaluation. arXiv 2023, arXiv:2304.01746.
44. Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives
and Valid Concerns. Healthcare 2023, 11, 887. [CrossRef] [PubMed]
45. Qureshi, R.; Shaughnessy, D.; Gill, K.A.R.; Robinson, K.A.; Li, T.; Agai, E. Are ChatGPT and Large Language Models “the Answer”
to Bringing Us Closer to Systematic Review Automation? Syst. Rev. 2023, 12, 72. [CrossRef] [PubMed]
46. Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [CrossRef]
47. Zeng, G. On the Confusion Matrix in Credit Scoring and Its Analytical Properties. Commun. Stat. Theory Methods 2020, 49,
2080–2093. [CrossRef]
48. Jan, F.; Min-Allah, N.; Saeed, S.; Iqbal, S.Z.; Ahmed, R. IoT-Based Solutions to Monitor Water Level, Leakage, and Motor Control
for Smart Water Tanks. Water 2022, 14, 309. [CrossRef]
49. Singh, M.; Ahmed, S. IoT Based Smart Water Management Systems: A Systematic Review. Mater. Today Proc. 2021, 46, 5211–5218.
[CrossRef]
50. Zulkifli, C.Z.; Garfan, S.; Talal, M.; Alamoodi, A.H.; Alamleh, A.; Ahmaro, I.Y.Y.; Sulaiman, S.; Ibrahim, A.B.; Zaidan, B.B.; Ismail,
A.R.; et al. IoT-Based Water Monitoring Systems: A Systematic Review. Water 2022, 14, 3621. [CrossRef]
51. Alshami, A.; Elsayed, M.; Mohandes, S.R.; Kineber, A.F.; Zayed, T.; Alyanbaawi, A.; Hamed, M.M. Performance Assessment of
Sewer Networks under Different Blockage Situations Using Internet-of-Things-Based Technologies. Sustainability 2022, 14, 14036.
[CrossRef]
52. Haluza, D.; Jungwirth, D. Artificial Intelligence and Ten Societal Megatrends: An Exploratory Study Using GPT-3. Systems
2023, 11, 120. [CrossRef]
53. Yang, X.; Li, Y.; Zhang, X.; Chen, H.; Cheng, W. Exploring the Limits of ChatGPT for Query or Aspect-Based Text Summarization.
arXiv 2023, arXiv:2302.08081.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like