A Review of Computer Vision-Based Monitoring Appro
A Review of Computer Vision-Based Monitoring Appro
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.Doi Number
ABSTRACT Construction workers’ behaviors directly affects labor productivity and their own safety,
thereby influencing project quality. Recognizing and monitoring the construction-related behaviors is
therefore crucial for high-quality management and orderly construction site operation. Recent strides in
computer vision technology suggest its potential to replace traditional manual supervision approaches. This
paper explores research on monitoring construction workers’ behaviors using computer vision. Through
bibliometrics and content-based analysis, the authors present the latest research in this area from three
perspectives: "Detection, Localization, and Tracking for Construction Workers," "Recognition of Workers’
Construction Activities," and "Occupational Health and Safety Behavior Monitoring." In terms of the
literature’s volume, there has been a notable increase in this field. Notably, the focus on safety-related
literature is predominant, underscoring the concern for occupational health. Vision algorithms have witnessed
an increase in the utilization of object detection. The ongoing and future research trajectory is anticipated to
involve multi-algorithm integration and an emphasis on enhancing robustness. Then the authors summarize
the review from engineering impact and technical suitability, and analyze the limitations of current research
from the perspectives of technical approaches and application scenarios. Finally, it discusses future research
directions in this field together with generative AI models. Furthermore, the authors hope this paper can
serves as a valuable reference for both scholars and engineers.
INDEX TERMS Computer vision, Construction worker, Construction behavior, Construction site,
Monitoring
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
and computer vision technology, are being employed in constantly updated and iterative, so a comprehensive review is
engineering construction supervision. needed to summarize and review the latest related literatures
The primary types of sensors used for monitoring in the field in order to understand the current research trends
construction workers are position sensors[11] and acceleration and status, challenges and limitations. At the same time, we
sensors[12]. However, using these sensors requires them to be hope that this paper can give some reference to engineers to
attached to the worker, which can affect their comfort and help innovation in construction management.
increase equipment maintenance costs [13]. The audio-based To achieve this objective, the remainder of the paper is
approach cannot identify construction tasks that produce less organized as follows: Chapter 2 outlines the methodology
sound or no sound at all[14]. In the field of civil engineering, employed for literature search, Chapter 3 describes the
computer vision technology has become a high-precision non- literature situation, Chapter 4 presents the computer vision
contact monitoring method, which is used in matters such as techniques utilized in reviewed research, Chapter 5 provides
cracks monitoring[15]-[17], strains measurement[18], [19], applications scenarios of computer vision techniques to
damage recognition[20], [21]. Vision-based approaches can worker construction-related behaviors, Chapter 6 lists
also record worker activities in real-time, allowing project summaries from the technical and practical application
managers to track construction site productivity, progress, and perspectives, and finally, research gaps and limitations are
safety risks, and make timely decisions about project progress. identified, and potential future research directions are
Using computer vision to achieve intelligent control of smart suggested.
construction sites is not only the next frontier science, but also
the driving force for the development of smart construction II. RESEARCH METHODOLOGY
sites[22]. Figure 1 is the technical framework of this article. Initially, the
Nonetheless, there have been fewer studies conducted to authors establish the study's background in the introduction,
monitor worker behavior than those assessing building providing the groundwork for subsequent research. Following
structure damages, as indicated by Mostafa and Hegazy [23]. this, a meticulous screening and analysis were conducted on
In recent years, several comprehensive reviews have been 137 publications published after 2013, all of which are
conducted, focusing on safety concerns in engineering pertinent to the work behaviors of construction workers.
construction. Zhang et al. [24] and Fang et al. [25] have Subsequently, based on the research content within these
explored the application of computer vision technology in articles, the examination and advancement of knowledge are
worker safety. From an economic development perspective, executed from three perspectives: the localization and tracking
Luo et al. [26]have elaborated on construction safety issues, of construction workers, the recognition of construction
emphasizing emerging trends like deep learning and activities, and the monitoring of occupational health and safety
interdisciplinary technologies. Additionally, Zeng et al.'s work behaviors. These three dimensions comprehensively
has pointed artificial intelligence as directions in construction encompass the critical issues of workers' movement
safety research[27] Construction workers’ behavior plays a trajectories, construction activities, and safety practices, all of
pivotal role in influencing construction quality[2]. Their which exert a direct or indirect influence on project quality
trajectories, work behaviors, and safety practices need to be control and the safety of construction personnel. Building
seamlessly integrated into intelligent management upon the insights gleaned from the literature review, a
systems[24], [28]. In recent years, the number of related works comprehensive synthesis and discussion are presented,
using computer vision technology to conduct research from elucidating the current state of research and delineating
the perspective of construction workers' behavior has been prospective avenues for future investigation.
increasing year by year, and the algorithmic techniques are
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
The authors conducted a search of relevant literature using introduction of the Faster R-CNN algorithm in 2015. Figure 3
Web of Science (WOS). The search string used was: (worker displays the sources of the papers, including Automation in
AND construction) AND ("computer vision" OR "vision- Construction, Journal of Computing in Civil Engineering,
based" OR "deep learning" OR "image processing" OR vision). Journal of Construction Engineering and Management,
The retrieval process comprised three sequential phases. In the Advanced Engineering Informatics, IEEE Access, etc. These
initial stage, utilizing the aforementioned string, 596 scholarly journals are reputable and may have a considerable potential
articles from journal publications and conference papers were to make further contributions to the field.
obtained from the WOS core collection. In the second stage,
review papers were excluded, and literature from 2013 to June
2023 was retained, leaving 496 papers. The third stage
involved a careful manual search of the abstract, keywords,
and body content to exclude content unrelated to "construction
workers" and "machine vision," leaving 137 papers for
analysis.
Figure 2 illustrates the distribution of the 137 articles
analyzed in this study by their year of publication and the
number of citations they received in each year. It is evident
that the number of publications in this field has grown rapidly
since 2015. Moreover, the number of citations reveals that the
research outcomes have received significant attention from the
academic community. This trend could be attributed to the
growing interest in intelligent construction technology and the FIGURE 2. The publication frequency across various years.
advancements in computer vision technology since 2010. The
popularity of computer vision was further enhanced with the
III. OVERVIEW OF REVIEWED LITERATURES analysis of the paper's keywords, which led to the creation of
VOSViewer, a bibliometric software, was utilized to cluster a heat map (depicted in Figure 4). In the process of generating
the reviewed papers and identify current topics of interest in this heat map, each point on the map is assigned a color based
the field[29]. The authors utilized VOSViewer for an in-depth on the density of the elements in its vicinity. A higher density
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
corresponds to a closer proximity to red, while lower density central themes in the literature revolve around keywords such
leans toward blue. The magnitude of this density is contingent as "deep learning," "computer vision," "construction," and
upon the significance of these elements, providing a rapid "workers," among others.
overview of critical research areas[27]. It is evident that the
Subsequently, the authors analyzed the literature's authors Xuefeng[74]–[76], [78], [80], and Cai Jiannan[81]-[84] have
and their institutions, and the results are visualized in Figure 5 shown potential in this field.
and Figure 6. In the figures, a larger font size and nodes Figure 7 illustrates the research progress in each country or
correspond to a higher volume of published literature. Various region, highlighting Mainland China's current leadership in
colors are used to denote distinct clusters, while the lines this field. This may be attributed to the rapid growth of China's
connecting the circles illustrate the interconnectedness construction industry since the 21st century, surpassing the
between authors and their affiliations through cross- United States in terms of construction output by 2012, with the
referencing[30].The filtering condition for this analysis was a gap in output widening over the following decade. However,
minimum of three publications, and the size of the bubbles in this growth has also brought about negative consequences,
the figures indicates that Hong Kong Polytechnic including safety and environmental concerns, leading China to
University[31]-[51] and Huazhong University of Science and prioritize enhanced regulation, smart construction, and green
Technology[32]-[34], [36], [38], [42], [50]-[58] are currently construction. As a result, this concerted effort has catalyzed
the swift advancement and implementation of computer vision
leading the field. Other institutions conducting research in this
technology in the construction industry. The United States, as
area include Chung Ang University[59]-[65], University of
the world's number one economy, has also made significant
Illinois[66]-[73], Dalian University of Technology[74]-[79],
contributions to this field. With its technology powerhouse
and more. Figure 5 highlights that Li Heng's team [31]-[36], status, the United States provides fertile ground for the
[38], [39], [41]-[51] at Hong Kong Polytechnic University and development of artificial intelligence technology. Furthermore,
Luo Hanbin's team[33], [42], [52]–[58] at Huazhong South Korea, Australia, and other countries have also shown
University of Science and Technology are currently leading significant development in the "computer vision + worker
the field, with similar colors representing the same research construction behavior + construction" field.
area. Additionally, Li Jiaqi[75], [76], [78], Zhao
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
IV. TECHNOLOGIES such as Mask R-CNN [93] are used for image segmentation
into different parts or regions. Generative adversarial networks
A. OVERVIEW OF COMPUTER VISION TECHNIQUES (GANs) [94] are used for image generation and restoration, as
Computer Vision, also known as Machine Vision, is a they can produce high-quality images and enhance image
technology that uses mathematical algorithms and computers restoration. Furthermore, long and short term memory
to automate and simulate human vision. Its primary purpose is networks (LSTM) [95] can be used for visual tasks such as
to extract valuable information from image and video data and video classification and object tracking.
then analyze, process, and understand it. Computer vision technologies mentioned above provide
Traditional computer vision methods rely mainly on image powerful support for quickly extracting information from
processing and pattern recognition techniques. These methods construction sites and identifying construction site conditions
include feature extraction techniques like edge detection [85]. and worker behavior.
Image classification methods such as support vector machines
(SVM) [86] can be used to classify objects based on their B. COMPUTER VISION ALGORITHMS’ INNOVATION
features. Object tracking methods like Kalman filter [87] can The construction environment is characterized by its intricate
track the position and motion state of objects in consecutive and dynamic nature, where diverse categories of heavy
frames. Additionally, segmentation methods based on machinery coexist and operate in parallel, alongside
thresholding, edges, regions etc. [88], can divide images into considerable personnel mobility. To better adapt computer
different regions or parts. While these traditional methods can vision technology to engineering needs, researchers have
be useful for tasks such as image recognition, their application improved and innovated upon existing research results in the
in complex scenarios is limited due to the need for manually field of computer vision, updating the technical framework.
designed features. For example, the recognition of steel bar-related activities
Currently, deep learning-based computer vision methods performed by construction workers was addressed by Luo et
have become mainstream. Convolutional neural networks al. [52] through the proposition of a three-stream CNN which
(CNN) and their derivatives, such as ResNet[89], are widely can capture static spatial features, short-term motion, and
used for various tasks, including image classification, long-term motion in video clips. Yang et al. [96] proposed the
detection, and segmentation. Recurrent neural networks (RNN) Spatial Temporal Relation Transformer (STR-Transformer),
[90] are used for sequential data processing, such as video, which better fuses temporal and spatial features in
speech, and text. Object detection networks such as Faster R- construction video clips. They also created a video clip dataset
CNN [91] and YOLO [92] are used to locate and measure the including seven types of construction workers' behaviors.
size of target objects in images, while segmentation networks Fang et al. [56] proposed an improved Faster R-CNN that
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
achieves better multi-scale detection in detecting and locating the features of various algorithms, such as Mask R-CNN and
workers and equipment on construction sites. Liu et al.[97] Kalman Filter, for tracking multiple construction workers,
proposed Multi-Domain Convolutional Neural Network to even in the face of challenges like occlusion and feature scale
improve the tracking performance of construction workers. variation. Piao et al. [110] proposed a framework for computer
Huang et al. [98] achieved better construction worker hardhat vision methods that incorporates Dynamic Bayesian Network,
wear detection performance by improving YOLO v3. Park et Openpose, and Faster R-CNN to assess the fall risk of
al[99] introduced DIoU and NMS for YOLO v5, utilizing construction workers during dynamic construction. Roberts et
weight triplet attention, expansion deature-level, and soft-pool al. [66] developed a computer vision-based method for
to improve the performance and enhance the ability of construction worker activity analysis, which included YOLO
detecting workers at construction sites, especially the ability v3, Alphapose, and i3D based on CNN. Their findings
to detect overlapping objects in complex environments was indicated that incorporating pose estimation enhances
enhanced. Other studies have also involved various computer construction activity analysis. Fang et al. [38] integrated Faster
vision methods for algorithm improvements [100]-[106]. R-CNN for detecting construction activities, SORT for
Establishing a highly stable vision algorithm model is partly tracking individuals, and face recognition for confirming the
dependent on high-quality training data. While some open- identity of workers, to eventually identify whether there are
source databases like COCO and ImageNet are available, they uncertified workers engaged in irrelevant construction tasks.
have limited applications in the construction industry. To Li et al. [78] used YOLO v5 to detect Personal Protective
address this limitation, Yang et al. [96] created a video clip Equipment (PPE) and Openpose to detect skeleton joints,
dataset comprising seven types of construction workers' incorporating the visual feature information to determine
behavior. Wu et al. [107] also developed an open-access whether the PPE was correctly used, using 1D-CNN. Ding et
dataset called GDUT-HWD, which includes 3174 images for al. [58] combined CNN and LSTM to identify construction
hardhat detection. Additionally, several studies have workers' unsafe behaviors. Cai et al. [84] used Faster R-CNN
developed new databases, including Yang et al. [108], Luo et to detect the head and body orientation of construction
al. [52], Tian et al. [44], and Xiong et al. [48]. These efforts workers, and subsequently applied a multi-task learning
have resulted in an increased availability of training data for network to assess the visual attention direction of construction
construction-related computer vision tasks. workers. Fang et al. [51] combined Mask R-CNN and
Section 4.1 emphasizes that different classes of vision Cascaded Pyramid Network to localize construction workers
algorithms can fulfill distinct functions, and for construction in monocular vision, then identified their unsafe behaviors.
sites with complex scenes, a single class of algorithm may not Table 1 presents other papers that integrated multiple
suffice for behavior recognition. Thus, combining multiple algorithms to achieve improved visual recognition.
algorithms is necessary. For instance, Xiao et al. [109]
proposed a construction worker tracking method that utilizes
TABLE I
TYPICAL RESEARCH FOR ALGORITHMS INTEGRATED
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
with a mixture density network to achieve the prediction of amalgamation of object detection algorithms with
construction workers' action trajectories. complementary methodologies. For instance, they combined
Computer vision-based detection and tracking methods YOLO v3 with SORT and 3D CNN to achieve spatial
have the potential to effectively assist project managers in localization and construction activities recognition of
analyzing the precise location information of construction construction workers [41]. In another study, the researchers
workers. As computer vision technology continues to advance, utilized YOLO v3, SORT, KCF, C3D, and CRF algorithms in
the accuracy and stability of tracking methods will gradually multiple steps, enabling the recognition of construction
improve, leading to enhanced tracking capabilities of activities performed by workers in groups [50]. Considering
individuals in 3D space and improved prediction of action the spatio-temporal properties of construction activities, Li et
trajectories. al. [76] directly employed the Faster R-CNN algorithm to
recognize construction activities in images, simultaneously
B. RECOGNITION OF WORKERS’ CONSTRUCTION detecting construction workers and determining the activities
ACTIVITIES performed by different operators based on the spatial
The accurate recognition of workers' construction activities relationship between individuals and activities. Bhokare et al.
plays a crucial role in enabling project managers to effectively [127] also proposed a method based on YOLO v3 for direct
monitor construction progress and optimize the allocation of recognition of construction activities.
labor resources. It serves as a valuable tool for analyzing labor To effectively integrate construction activities with spatio-
productivity. Previous studies primarily employed traditional temporal information, utilizing video clips for recognition is a
image processing algorithms for construction activities more promising approach. Previous studies have proposed the
recognition. For instance, Liu et al. [125] utilized a silhouette- use of multistream convolutional neural networks [45], [52].
based approach, while Yang et al. [108] employed an SVM In recent years, research methods based on videos have
approach for classifying image features. However, with become more diverse. Roberts et al. [66] incorporated pose
technological advancements, Yang et al. [126] enhanced the estimation algorithms into video-based construction activities
method by incorporating data-driven scene parsing while recognition. Similarly, Cai et al. [128] introduced an attention
retaining the dense trajectories. direction estimation method to identify groups of construction
In the last several years, the advancement of convolutional workers and subsequently classified their activities using
neural networks has led to reduced training costs for object LSTM. Li et al[116] identified three activities of construction
detection algorithms, as well as improved detection speed and workers through YOLO and ST-GCN, which are throwing,
accuracy. These advancements have facilitated the application operating and crossing. Torabi et al. [129] and Yang et al. [96]
of object detection algorithms in construction activities employed YOWO53 and Transformer, respectively,
recognition. Luo et al. [43] employed Faster R-CNN for the demonstrating high recognition accuracy on today's advanced
identification of construction workers and various entities computing hardware. Li and Li [79] applied Openpose and
depicted in images captured at construction sites. They GAN to estimate the complete skeleton joints of construction
employed a relevance network to recognize multiple workers under occlusion, then used ResNet to recognize
construction activities by analyzing the spatial relationship construction activities.
between them. Similar studies include Fang et al. [38], who Table 2 presents the comprehensive details of the reviewed
employed SORT and face recognition for the detection of non- literature in this section. The literature encompasses a broad
certified work, and Li et al. [75], who used CenterNet to detect spectrum of application scenarios, extending beyond the
construction workers and objects to evaluate construction physical activities of construction workers to encompass
productivity while recognizing reinforcement assembly various processes within civil engineering construction. These
activities. technologies hold significant potential for facilitating
Extensive investigations in the domain of construction intelligent construction and enhancing project management.
activity recognition have been conducted by the research
group led by Luo Xiaochun and Li Heng, delving into the
TABLE II
CONSTRUCTION ACTIVITIES RECOGNITION RESEARCH DETAILS
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
construction workers. Additionally, Seo et al. [134] compared study, Yu et al. [47] curated a construction posture dataset
multiple pose estimation methods and concluded that the encompassing various postures to facilitate 3D posture
sensor-based approach yielded the least error. estimation. Chu et al. [135] developed an ergonomic 3D
In past few years, the rise of deep learning methods has posture assessment framework for construction workers,
facilitated direct vision-based pose estimation. Zhang et al. [46] which integrated tracking algorithms, 2D detection, and 3D
employed a multi-stage convolutional neural network to body generation. Kim et al. [136] constructed a synthetic
estimate the 3D pose of construction workers using monocular dataset to enhance 3D pose estimation, while Tian et al. [44]
vision. Yu et al. [36] utilized a vision-based approach to obtain compiled a construction worker motion dataset that can
the 3D spatial pose of construction workers and further contribute to 3D pose recognition. Table 3 provides a list of
collected ergonomic information through an intelligent insole, typical studies in pose estimation.
enabling the analysis of their physical condition. In another
TABLE III
TYPICAL RESEARCH FOR POSE ESTIMATION
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
inference. Yan et al. [35] employed the Faster R-CNN collision risks between construction workers and tower cranes.
algorithm to detect key points of construction trucks, followed Additionally, other studies have contributed to collision risk
by 3D reconstruction to establish distance recognition in the identification [64], [147], [148]and fall risk identification
3D scene. Shin et al. [146] integrated YOLO v3 and the [149]. Table 4 presents some of the typical research literature
Openpose algorithm to identify collision risks between in this field.
construction workers and trucks. Zhang and Ge[77] employed
the latest Transformer technology to identify dynamic
TABLE IV
TYPICAL RESEARCH FOR UNSAFE BEHAVIOR RECOGNITION
3) PPE’S USAGE INSPECTION instance, Park et al. [157] utilized image processing methods
Safety accidents are prevalent in engineering construction, to extract the spatial relationship between construction
with a significant number of casualties resulting from fall workers and hardhats. They subsequently employed SVM to
accidents and object strikes, as reported by the Ministry of detect and match this feature, determining whether
Urban and Rural Construction of the People's Republic of construction workers were wearing hardhats or not.
China and the U.S. Bureau of Labor Statistics [152]-[154]. To Mneymneh et al. [158]-[160] extracted various image features
mitigate the occurrence of such casualties, various countries such as SURF and employed template matching and cascade
have implemented mandatory policies and regulations classifiers for hardhat detection. In order to achieve better
regarding personal protective equipment (PPE). For instance, performance in safety vest detection, Seong et al. [161]
the State Administration of Quality Supervision, Inspection compared various features and classifier algorithms. The
and Quarantine of China and the U.S. Occupational Safety and results demonstrated that combining the C4.5 classifier with
Health Administration have specified the use of hardhats YCbCr and SVM classifier yielded superior outcomes. Similar
during construction operations [155], [156], and the to previous application areas, deep learning methods are
requirement for safety harnesses when working at heights. rapidly replacing traditional techniques. Fang et al. [55]
With the advancements in artificial intelligence technology, employed Faster R-CNN to detect the usage of safety
numerous computer vision-based studies have emerged to harnesses by construction workers at heights. In another study
address the detection of PPE usage. conducted by the same group, Fang et al. [42] employed Faster
Before the rapid advancement of deep learning techniques, R-CNN for far-field monitoring at construction sites to detect
traditional methods for processing image features and machine hardhat wearing. The utilization of the Faster R-CNN
learning algorithms were more commonly employed. For algorithm has significantly enhanced image recognition
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
accuracy [162]. Fu et al. [163] employed this algorithm for joints. They employed Scene graph representation to describe
hardhat wearing detection, while Kamoona et al. [164] applied the spatial relationship between joints and PPE, enabling the
it to detect high-visibility vests. detection of multiple PPE items such as hardhats, dust masks,
As deep learning algorithms continue to advance, Delhi et safety glasses, and safety belts. Li et al. [78] introduced the
al. [165] applied YOLO v3 for detecting the usage of hardhats hierarchy of control and emphasized the importance of
and jackets among construction workers. YOLO v3 is an ensuring correct usage of PPE from an administrative control
algorithm that has made significant strides in the field of object perspective. By combining the features extracted by YOLO v5
detection, leading to extensive research employing or and Openpose, they employed 1D-CNN for classification and
enhancing this algorithm to detect the personal protective detection of incorrect usage of hardhats and safety harnesses.
equipment (PPE) of construction workers [98], [102], [166], Table 5 provides further details on some typical studies.
[167]. Among them, Nath et al. [166] developed three distinct
models based on the YOLO v3 algorithm. In their second D. SUMMARY
model, the algorithm simultaneously detected individual This section presents an overview of computer vision
workers and verified PPE compliance using a single CNN applications for recognizing construction worker behavior on
framework, achieving a mean average precision (mAP) of construction sites, primarily focusing on occupational health
72.3%. Additionally, apart from v3, the method has been and safety risks. These articles represent significant
continuously updated to incorporate algorithms such as advancements in utilizing computer vision technology to
YOLO v4 and v5, which exhibit even better performance in enhance both productivity and safety in engineering
hardhat detection[100], [101], [104], [168]-[173]. Nguyen et construction. Compared to other sensing and monitoring
al. [101] created a dataset comprising 11,978 images and technologies, computer vision technology offers immense
tested various versions of YOLO v5. Experimental results potential for development, cost-effectiveness, and a wide
indicated that the enhanced YOLO v5s demonstrated the best range of applications. According to survey results, CNNs
detection performance, achieving a precision of 0.74 on their emerge as the most widely employed technique among
custom dataset. researchers, leveraging their ability to rapidly extract features
To enhance detection accuracy, researchers have explored compared to traditional machine learning algorithms. With the
various algorithms [61], [105], [174]-[179] and integrated continuous evolution of CNNs, object detection technology
multiple algorithms to address the PPE monitoring has been extensively adopted, demonstrating strong
challenge[60], [78], [111], [114], [115], [180]-[184]. Some adaptability in monitoring worker construction behavior. It
scholars have also developed open-source PPE usage datasets has achieved breakthroughs in detection performance and real-
for the broader academic community [107]. Xu et al. [175] time monitoring, enabling the tracking of construction
proposed a novel detection strategy called the matching- workers' location, activities, and safety risks. As object
recheck strategy and evaluated safety harness detection using detection technology continues to advance and Transformer
the newly introduced Efficient YOLO v5 on a custom dataset, technology gains prominence, computer vision technology
with 94% mAP. Chen and Demachi [184] utilized YOLO v3 holds the promise of making even greater contributions to the
and Openpose to extract features related to PPE and individual field's development.
TABLE V
TYPICAL RESEARCH FOR PPE’S USAGE INSPECTION
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
attention in future research, which can establish a large-scale e. Standard benchmarks and evaluation metrics will be
construction activities database for the development of established for comparing different computer vision-based
computer vision-based approaches. progress monitoring methods and systems.
Engineering construction sites present intricate scenarios, f. Integration of multi-sensor data, such as wearable sensors
with environmental variations observed across different or GPS, with visual data for a more comprehensive
locations. Some research is carried out in laboratory settings understanding of workers' behavior.
for image acquisition and testing, while others collect data g. Development of worker-centered monitoring approaches
from specific engineering projects. Consequently, the while respecting privacy and autonomy.
performance and effectiveness of these approaches cannot be h. Exploration of robust monitoring systems that can adapt
assured when applied to different sites. To illustrate, to changing environments, including lighting conditions and
considering Personal Protective Equipment (PPE) usage weather.
detection, numerous studies report accuracy rates surpassing i. Implementation of real-time monitoring and intervention
90%. However, these achievements are often based on self- systems to provide immediate feedback to workers and
constructed datasets for testing rather than utilizing open- prevent potential hazards.
source public datasets. Moreover, generative AI models offer substantial potential
Future research endeavors should focus on constructing benefits. Excellent design capabilities are exemplified by
extensive public datasets encompassing diverse facets of Cadence's Allegro X AI technology. This generative AI model
worker construction behavior. This comprehensive dataset facilitates the optimization of the system design process.
should span static images and dynamic videos, encompassing Furthermore, the generative AI models‘ capacity holds
a spectrum of construction scenarios across various project promise for creating algorithms to recognize the construction
types. It is strongly advised that researchers employ these workers’ behaviors. When this technology matures, the AI
public datasets during the development of algorithmic model can seamlessly execute automated testing of product
frameworks. Doing so would significantly contribute to the features, including compatibility testing for software
broader comprehension of algorithms among researchers, applications. Additionally, in the realm of construction
fostering collaboration and knowledge exchange. management, where decision-making is integral, generative
3) GENERATIVE AI’S RESPONSE AI models can prove invaluable by providing targeted advice.
In 2023, generative AI has emerged as a prominent field
within AI, with ChatGPT[188] and NewBing[189] standing VII. CONCLUSIONS
out as two highly popular generative AI models. ChatGPT, This paper presents a comprehensive review of the literature
developed by OpenAI, is a conversational AI model built upon pertaining to the study of monitoring construction workers'
the GPT 3.5 series models, which have been trained using behavior using computer vision technology. Initially, the
reinforcement learning techniques. NewBing, on the other researchers conducted a keyword search on the WOS
hand, is an interactive tool offered by Microsoft that platform, resulting in 596 papers from the past decade.
incorporates ChatGPT technology to provide more innovative Through manual inspection, 137 papers were selected for
insights and suggestions. further analysis. The distribution of publication years
To gather Q&A information related to generative AI, Dr. indicates that the field is undergoing rapid development, with
Yang Zhaozheng from the University of Strasbourg adopted ongoing updates to technical methods. The authors
both NewBing and ChatGPT, directing specific inquiries to employed VOSViewer software to identify research hotspots
each model. It should be noted that the statements presented and highlight notable researchers and institutions in the field.
by Generative AI models do not reflect the perspectives of Dr. The paper provides an overview of computer vision
Yang or any of the contributors to this article. techniques and technological advancements in worker
Here are some suggestions for future research directions behavior recognition. The reviewed literature is categorized
given by generative AI models: into three sections based on application scenarios: "
a. Enhance object detection and recognition algorithms for Detection, localization and tracking for construction
construction workers in complex indoor environments. workers," " Recognition of workers’ construction activities,"
b. Improvement is sought in accuracy and robustness and " Occupational health and safety behavior monitoring."
despite variations in lighting conditions, occlusions, camera The abundance of literature demonstrates the preference of
movements, and background noises. current researchers for employing computer vision to address
c. Combining computer vision with technologies like deep safety concerns in engineering construction. The analysis
learning, IoT, augmented reality, and digital twin for better reveals a trend wherein vision-based approaches begin to be
performance in progress monitoring systems. adopted in real-world engineering, impacting decision-
d. Progress monitoring will not only cover physical aspects making processes in civil engineering. Initially, across
but also psychological, social, and environmental factors different application scenarios, early research inclined
affecting workers' well-being and behavior. towards addressing challenges through conventional image
processing methods and machine-learning classification.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
However, as Convolutional Neural Networks (CNNs) and [9] “Construction 2025: strategy,” GOV.UK. Accessed: Apr. 20, 2023.
[Online]. Available:
target detection techniques matured, vision methods https://www.gov.uk/government/publications/construction-2025-
exhibited enhanced robustness, adept at handling behavior strategy
recognition issues even in intricate scenarios, leading to the [10] Notice on the issuance of the "14th Five-Year Plan" for the
development of the construction industry. Accessed: Apr. 20, 2023.
swift displacement of traditional methods. Recent years have
[Online]. Available: http://www.gov.cn/zhengce/zhengceku/2022-
witnessed an increase in the integration of multiple 01/27/content_5670687.htm
algorithms to address real-world challenges, marking a [11] A. Montaser and O. Moselhi, “RFID indoor location identification for
prevalent trend. Additionally, the emergence of transformer construction projects,” Autom. Constr., vol. 39, pp. 167–179, Apr. 2014,
doi: 10.1016/j.autcon.2013.06.012.
technology holds the promise of transcending existing [12] N. Nath, R. Akhavian, and A. Behzadan, “Ergonomic analysis of
algorithmic limitations in this domain. construction worker’s body postures using wearable mobile sensors,”
The authors conclude by summarizing research gaps and Appl. Ergon., vol. 62, pp. 107–117, Jul. 2017, doi:
10.1016/j.apergo.2017.02.007.
outlining future directions. Further optimization is required [13] L. Joshua and K. Varghese, “Accelerometer-Based Activity
in terms of accuracy and speed, especially when integrating Recognition in Construction,” J. Comput. Civ. Eng., vol. 25, pp. 370–
multiple algorithms, necessitating the establishment of a 379, Sep. 2011, doi: 10.1061/(ASCE)CP.1943-5487.0000097.
[14] Y. Lee, M. Scarpiniti, and A. Uncini, “Advanced Sound Classifiers and
unified framework platform and integration strategy. In Performance Analyses for Accurate Audio-Based Construction Project
terms of application scenarios, improvements can be made to Monitoring,” J. Comput. Civ. Eng., vol. 34, p. 04020030, Sep. 2020,
enhance scene applicability and diversify data sources. doi: 10.1061/(ASCE)CP.1943-5487.0000911.
Furthermore, insights from the generative AI model [15] Y. Zhang, Y.-Q. Ni, X. Jia, and Y.-W. Wang, “Identification of
concrete surface damage based on probabilistic deep learning of
underscore the importance of considering the privacy and images,” Autom. Constr., vol. 156, p. 105141, Dec. 2023, doi:
autonomy of construction workers, suggesting their 10.1016/j.autcon.2023.105141.
involvement in monitoring projects. This paper serves as a [16] Y. Tang et al., “Novel visual crack width measurement based on
backbone double-scale features for improved detection automation,”
reference for engineers and researchers, aiming to contribute Eng. Struct., vol. 274, p. 115158, Jan. 2023, doi:
to the field of engineering and construction while benefiting 10.1016/j.engstruct.2022.115158.
fellow scholars' scientific studies. [17] Z. Wu, Y. Tang, B. Hong, B. Liang, and Y. Liu, “Enhanced Precision
in Dam Crack Width Measurement: Leveraging Advanced Lightweight
Network Identification for Pixel-Level Accuracy,” Int. J. Intell. Syst.,
ACKNOWLEDGMENT vol. 2023, p. e9940881, Sep. 2023, doi: 10.1155/2023/9940881.
Thanks are due to Dr. Yang Zhaozheng from University of [18] X. Botao, L. Zhang, M. Ding, W. Li, and X. Zhao, “Strain measurement
based on cooperative operation with different smartphones,” Comput.-
Strasbourg for assistance with the Generative AI models. Aided Civ. Infrastruct. Eng., vol. 38, Sep. 2022, doi:
10.1111/mice.12919.
REFERENCES [19] Y. Tang et al., “Seismic performance evaluation of recycled aggregate
[1] Y. Zhang, T. Wang, and K.-V. Yuen, “Construction site information concrete-filled steel tubular columns with field strain detected via a
decentralized management using blockchain and smart contracts,” novel mark-free vision method,” Structures, vol. 37, pp. 426–441, Mar.
Comput.-Aided Civ. Infrastruct. Eng., vol. 37, no. 11, pp. 1450–1467, 2022, doi: 10.1016/j.istruc.2021.12.055.
2022, doi: 10.1111/mice.12804. [20] Y. Zhang, X. Zhao, and P. Liu, “Multi-Point Displacement Monitoring
[2] M. Yihua and X. Tuo, “Research of 4M1E’s effect on engineering Based on Full Convolutional Neural Network and Smartphone,” IEEE
quality based on structural equation model,” Syst. Eng. Procedia, vol. Access, vol. 7, pp. 139628–139634, 2019, doi:
1, pp. 213–220, Jan. 2011, doi: 10.1016/j.sepro.2011.08.034. 10.1109/ACCESS.2019.2943599.
[3] H. Zhou, Y. Zhao, Q. Shen, L. Yang, and H. Cai, “Risk assessment and [21] Y. Zhang and K.-V. Yuen, “Bolt damage identification based on
management via multi-source information fusion for undersea tunnel orientation-aware center point estimation network,” Struct. Health
construction,” Autom. Constr., vol. 111, p. 103050, Mar. 2020, doi: Monit., vol. 21, p. 147592172110042, Mar. 2021, doi:
10.1016/j.autcon.2019.103050. 10.1177/14759217211004243.
[4] C. Sheehan, R. Donohue, T. Shea, B. Cooper, and H. De Cieri, [22] W. Ma, “Technical framework of energy-saving construction
“Leading and lagging indicators of occupational health and safety: The management of intelligent building based on computer vision
moderating role of safety leadership,” Accid. Anal. Prev., vol. 92, pp. algorithm,” Soft Comput., May 2023, doi: 10.1007/s00500-023-08424-
130–138, Jul. 2016, doi: 10.1016/j.aap.2016.03.018. 1.
[5] Opinions on promoting the sustainable and healthy development of the [23] K. Mostafa and T. Hegazy, “Review of image-based analysis and
construction industry--General Office of the State Council. Accessed: applications in construction,” Autom. Constr., vol. 122, p. 103516, Feb.
Apr. 20, 2023. [Online]. Available: 2021, doi: 10.1016/j.autcon.2020.103516.
http://www.gov.cn/zhengce/zhengceku/2017- [24] M. Zhang, R. Shi, and Z. Yang, “A critical review of vision-based
02/24/content_5170625.htm occupational health and safety monitoring of construction site workers,”
[6] “Bringing Innovation to the Worksite with‘ Smart Construction’ / The Saf. Sci., vol. 126, p. 104658, Jun. 2020, doi:
Government of Japan - JapanGov -,” The Government of Japan - 10.1016/j.ssci.2020.104658.
JapanGov -. Accessed: Apr. 20, 2023. [Online]. Available: [25] W. Fang, P. E. D. Love, H. Luo, and L. Ding, “Computer vision for
https://www.japan.go.jp/tomodachi/2017/autumn2017/power_of_inno behaviour-based safety in construction: A review and future directions,”
vation.html Adv. Eng. Inform., vol. 43, p. 100980, Jan. 2020, doi:
[7] T. Dawood, Z. Zhu, and T. Zayed, “Computer Vision–Based Model for 10.1016/j.aei.2019.100980.
Moisture Marks Detection and Recognition in Subway Networks,” J. [26] F. Luo, R. Y. M. Li, M. J. C. Crabbe, and R. Pu, “Economic
Comput. Civ. Eng., vol. 32, no. 2, p. 04017079, Mar. 2018, doi: development and construction safety research: A bibliometrics
10.1061/(ASCE)CP.1943-5487.0000728. approach,” Saf. Sci., vol. 145, p. 105519, Jan. 2022, doi:
[8] L. Xu, E. Xu, and L. Li, “Industry 4.0: State of the art and future trends,” 10.1016/j.ssci.2021.105519.
Int. J. Prod. Res., vol. 56, pp. 1–22, Mar. 2018, doi: [27] L. Zeng and R. Y. M. Li, “Construction safety and health hazard
10.1080/00207543.2018.1444806. awareness in Web of Science and Weibo between 1991 and 2021,” Saf.
Sci., vol. 152, p. 105790, Aug. 2022, doi: 10.1016/j.ssci.2022.105790.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
[28] D. Fang, Y. Huang, H. Guo, and H. W. Lim, “LCB approach for [47] Y. Yu, H. Li, J. Cao, and X. Luo, “Three-Dimensional Working Pose
construction safety,” Saf. Sci., vol. 128, p. 104761, Aug. 2020, doi: Estimation in Industrial Scenarios With Monocular Camera,” IEEE
10.1016/j.ssci.2020.104761. Internet Things J., vol. PP, pp. 1–1, Aug. 2020, doi:
[29] U. A. Bukar, M. S. Sayeed, S. F. A. Razak, S. Yogarayan, O. A. Amodu, 10.1109/JIOT.2020.3014930.
and R. A. R. Mahmood, “A method for analyzing text using [48] R. Xiong, Y. Song, H. Li, and Y. Wang, “Onsite Video Mining for
VOSviewer,” Methods X, vol. 11, Dec. 2023, doi: Construction Hazards Identification with Visual Relationships,” Adv.
10.1016/j.mex.2023.102339. Eng. Inform., vol. 42, Jul. 2019, doi: 10.1016/j.aei.2019.100966.
[30] N. J. van Eck and L. Waltman, “Software survey: VOSviewer, a [49] X. Luo, H. Li, X. Yang, Y. Yu, and D. Cao, “ Capturing and
computer program for bibliometric mapping,” Scientometrics, vol. 84, Understanding Workers’ Activities in Far‐Field Surveillance Videos
no. 2, pp. 523–538, Aug. 2010, doi: 10.1007/s11192-009-0146-3. with Deep Action Recognition and Bayesian Nonparametric Learning,”
[31] Y. Yu, X. Yang, H. Li, X. Luo, H. Guo, and Q. Fang, “Joint-Level Comput.-Aided Civ. Infrastruct. Eng., vol. 34, Oct. 2018, doi:
Vision-Based Ergonomic Assessment Tool for Construction Workers,” 10.1111/mice.12419.
J. Constr. Eng. Manag., vol. 145, May 2019, doi: [50] X. Luo, H. Li, Y. Yu, C. Zhou, and D. Cao, “Combining deep features
10.1061/(ASCE)CO.1943-7862.0001647. and activity context to improve recognition of activities of workers in
[32] H. Wu, B. Zhong, H. Li, and N. Zhao, “Combining computer vision groups,” Comput.-Aided Civ. Infrastruct. Eng., vol. 35, Feb. 2020, doi:
with semantic reasoning for on-site safety management in construction,” 10.1111/mice.12538.
J. Build. Eng., vol. 42, p. 103036, Oct. 2021, doi: [51] Q. Fang, H. Li, X. Luo, C. Li, and W. An, “A sematic and prior-
10.1016/j.jobe.2021.103036. knowledge-aided monocular localization method for construction-
[33] K. Ting, W. Fang, H. Luo, S. Xu, and H. Li, “Computer vision and long related entities,” Comput.-Aided Civ. Infrastruct. Eng., vol. 35, Mar.
short-term memory: Learning to predict unsafe behaviour in 2020, doi: 10.1111/mice.12541.
construction,” Adv. Eng. Inform., vol. 50, p. 101400, Oct. 2021, doi: [52] H. Luo, C. Xiong, W. Fang, P. E. D. Love, B. Zhang, and X. Ouyang,
10.1016/j.aei.2021.101400. “Convolutional neural networks: Computer vision-based workforce
[34] Y. Yu, H. Li, X. Yang, L. Kong, X. Luo, and A. Wong, “An automatic activity assessment in construction,” Autom. Constr., vol. 94, pp. 282–
and non-invasive physical fatigue assessment method for construction 289, Oct. 2018, doi: 10.1016/j.autcon.2018.06.007.
workers,” Autom. Constr., vol. 103, pp. 1–12, Jul. 2019, doi: [53] W. Fang, B. Zhong, N. Zhao, H. Luo, J. Xue, and S. Xu, “A Deep
10.1016/j.autcon.2019.02.020. Learning-based Approach for Mitigating Falls from Height with
[35] X. Yan, H. Zhang, and H. Li, “Computer vision‐based recognition Computer Vision: Convolutional Neural Network Accepted Version,”
of 3D relationship between construction entities for monitoring Adv. Eng. Inform., vol. 39, pp. 170–177, Jan. 2019, doi:
struck‐by accidents,” Comput.-Aided Civ. Infrastruct. Eng., pp. 1– 10.1016/j.aei.2018.12.005.
16, Sep. 2020, doi: 10.1111/mice.12536. [54] R. Wei, W. Fang, H. Luo, and S. Xu, “Recognizing People’s Identity
[36] Y. Yu et al., “Automatic Biomechanical Workload Estimation for in Construction Sites with Computer Vision: A Spatial and Temporal
Construction Workers by Computer Vision and Smart Insoles,” J. Attention Pooling Network,” Adv. Eng. Inform., vol. 42, Aug. 2019,
Comput. Civ. Eng., vol. 33, May 2019, doi: 10.1061/(ASCE)CP.1943- doi: 10.1016/j.aei.2019.100981.
5487.0000827. [55] F. Weili, L. Ding, and H. Luo, “Falls from Heights: A Computer
[37] J. Seo, K. Yin, and S. Lee, Automated Postural Ergonomic Assessment Vision-based Approach for Safety Harness Detection,” Autom. Constr.,
Using a Computer Vision-Based Posture Classification. 2016, p. 818. vol. 91, pp. 53–61, Feb. 2018, doi: 10.1016/j.autcon.2018.02.018.
doi: 10.1061/9780784479827.082. [56] W. Fang, L. Ding, B. Zhong, P. E. D. Love, and H. Luo, “Automated
[38] Q. Fang et al., “A deep learning-based method for detecting non- detection of workers and heavy equipment on construction sites: A
certified work on construction sites,” Adv. Eng. Inform., vol. 35, pp. convolutional neural network approach,” Adv. Eng. Inform., vol. 37, pp.
56–68, Jan. 2018, doi: 10.1016/j.aei.2018.01.001. 139–149, Aug. 2018, doi: 10.1016/j.aei.2018.05.003.
[39] X. Yan, H. Zhang, and H. Li, “Estimating Worker-Centric 3D Spatial [57] J. Liu, W. Fang, T. Hartmann, H. Luo, and L. Wang, “Detection and
Crowdedness for Construction Safety Management Using a Single 2D location of unsafe behaviour in digital images: A visual grounding
Camera,” J. Comput. Civ. Eng., vol. 33, p. 04019030, Jun. 2019, doi: approach,” Adv. Eng. Inform., vol. 53, p. 101688, Aug. 2022, doi:
10.1061/(ASCE)CP.1943-5487.0000844. 10.1016/j.aei.2022.101688.
[40] H. Liu, G. Wang, T. Huang, P. He, M. Skitmore, and X. Luo, [58] L. Ding, W. Fang, H. Luo, B. Zhong, and X. Quyang, “A Deep Hybrid
“Manifesting construction activity scenes via image captioning,” Learning Model to Detect Unsafe Behavior: Integrating Convolution
Autom. Constr., vol. 119, p. 103334, Nov. 2020, doi: Neural Networks and Long Short-Term Memory,” Autom. Constr., vol.
10.1016/j.autcon.2020.103334. 86, p. 124, Feb. 2018, doi: 10.1016/j.autcon.2017.11.002.
[41] X. Luo, H. Li, H. Wang, Z. Wu, F. Dai, and D. Cao, “Vision-based [59] S. Anjum, N. Khan, R. Khalid, M. Khan, L. Dongmin, and C. Park,
detection and visualization of dynamic workspaces,” Autom. Constr., “Fall Prevention From Ladders Utilizing a Deep Learning-Based
vol. 104, pp. 1–13, Aug. 2019, doi: 10.1016/j.autcon.2019.04.001. Height Assessment Method,” IEEE Access, vol. 10, pp. 1–1, Jan. 2022,
[42] Q. Fang et al., “Detecting non-hardhat-use by a deep learning method doi: 10.1109/ACCESS.2022.3164676.
from far-field surveillance videos,” Autom. Constr., vol. 85, pp. 1–9, [60] M. Khan, R. Khalid, S. Anjum, S. Tran, and C. Park, “Fall Prevention
Jan. 2018, doi: 10.1016/j.autcon.2017.09.018. from Scaffolding Using Computer Vision and IoT-Based Monitoring,”
[43] X. Luo, H. Li, D. Cao, F. Dai, J. Seo, and S. Lee, “Recognizing Diverse J. Constr. Eng. Manag., vol. 148, pp. 1–15, Apr. 2022, doi:
Construction Activities in Site Images via Relevance Networks of 10.1061/%28ASCE%29CO.1943-7862.0002278.
Construction Related Objects Detected by Convolutional Neural [61] J. Lim, D. Jung, C. Park, and D. Kim, “Computer Vision Process
Networks,” J. Comput. Civ. Eng., vol. 32, Nov. 2017, doi: Development regarding Worker’s Safety Harness and Hook to Prevent
10.1061/(ASCE)CP.1943-5487.0000756. Fall Accidents: Focused on System Scaffolds in South Korea,” Adv.
[44] Y. Tian, H. Li, H. Cui, and J. Chen, “Construction motion data library: Civ. Eng., vol. 2022, Jul. 2022, doi: 10.1155/2022/4678479.
an integrated motion dataset for on-site activity recognition,” Sci. Data, [62] H. Son, H. Choi, H. Seong, and C. Kim, “Detection of construction
vol. 9, Nov. 2022, doi: 10.1038/s41597-022-01841-1. workers under varying poses and changing background in image
[45] X. Luo, H. Li, D. Cao, Y. Yu, X. Yang, and T. Huang, “Towards sequences via very deep residual networks,” Autom. Constr., vol. 99,
efficient and objective work sampling: Recognizing workers’ activities pp. 27–38, Mar. 2019, doi: 10.1016/j.autcon.2018.11.033.
in site surveillance videos with two-stream convolutional networks,” [63] H. Son and C. Kim, “Integrated worker detection and tracking for the
Autom. Constr., vol. 94, pp. 360–370, Oct. 2018, doi: safe operation of construction machinery,” Autom. Constr., vol. 126, p.
10.1016/j.autcon.2018.07.011. 103670, Jun. 2021, doi: 10.1016/j.autcon.2021.103670.
[46] H. Zhang, X. Yan, and H. Li, “Ergonomic posture recognition using [64] H. Son, H. Seong, H. Choi, and C. Kim, “Real-Time Vision-Based
3D view-invariant features from single ordinary camera,” Autom. Warning System for Prevention of Collisions between Workers and
Constr., vol. 94, pp. 1–10, Oct. 2018, doi: Heavy Equipment,” J. Comput. Civ. Eng., vol. 33, p. 04019029, Sep.
10.1016/j.autcon.2018.05.033. 2019, doi: 10.1061/(ASCE)CP.1943-5487.0000845.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
[65] N. Khan, M. R. Saleem, D. Lee, M.-W. Park, and C. Park, “Utilizing and dynamic construction sites,” Adv. Eng. Inform., vol. 46, p. 101173,
safety rule correlation for mobile scaffolds monitoring leveraging deep Oct. 2020, doi: 10.1016/j.aei.2020.101173.
convolution neural networks,” Comput. Ind., vol. 129, p. 103448, Aug. [84] J. Cai, L. Yang, Y. Zhang, S. Li, and H. Cai, “Multitask Learning
2021, doi: 10.1016/j.compind.2021.103448. Method for Detecting the Visual Focus of Attention of Construction
[66] D. Roberts, W. Torres-Calderon, S. Tang, and M. Golparvar-Fard, Workers,” J. Constr. Eng. Manag., vol. 147, Apr. 2021, doi:
“Vision-Based Construction Worker Activity Analysis Informed by 10.1061/(ASCE)CO.1943-7862.0002071.
Body Posture,” J. Comput. Civ. Eng., vol. 34, p. 04020017, Jul. 2020, [85] J. Canny, “A Computational Approach to Edge Detection,” in Readings
doi: 10.1061/(ASCE)CP.1943-5487.0000898. in Computer Vision, M. A. Fischler and O. Firschein, Eds., San
[67] S. Tang, D. Roberts, and M. Golparvar-Fard, “Human-object Francisco (CA): Morgan Kaufmann, 1987, pp. 184–203. doi:
interaction recognition for automatic construction site safety inspection,” 10.1016/B978-0-08-051581-6.50024-6.
Autom. Constr., vol. 120, p. 103356, Dec. 2020, doi: [86] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
10.1016/j.autcon.2020.103356. vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
[68] S. Han and S. Lee, “A vision-based motion capture and recognition [87] R. Kalman, “A New Approach To Linear Filtering and Prediction
framework for behavior-based safety management,” Autom. Constr., Problems,” J. Basic Eng. ASME, vol. 82D, pp. 35–45, Mar. 1960, doi:
vol. 35, pp. 131–141, Nov. 2013, doi: 10.1016/j.autcon.2013.05.001. 10.1115/1.3662552.
[69] A. Khosrowpour, J. C. Niebles, and M. Golparvar-Fard, “Vision-based [88] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-Based Image
workface assessment using depth images for activity analysis of interior Segmentation,” Int. J. Comput. Vis., vol. 59, pp. 167–181, Sep. 2004,
construction operations,” Autom. Constr., vol. 48, pp. 74–87, Dec. 2014, doi: 10.1023/B%3AVISI.0000022288.19776.77.
doi: 10.1016/j.autcon.2014.08.003. [89] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for
[70] S. Han, S. Lee, and F. Peña-Mora, “Vision-Based Detection of Unsafe Image Recognition.” arXiv, Dec. 10, 2015. doi:
Actions of a Construction Worker: Case Study of Ladder Climbing,” J. 10.48550/arXiv.1512.03385.
Comput. Civ. Eng., vol. 27, pp. 635–644, Nov. 2013, doi: [90] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent Neural Network
10.1061/(ASCE)CP.1943-5487.0000279. Regularization.” arXiv, Feb. 19, 2015. doi: 10.48550/arXiv.1409.2329.
[71] S. Tang and M. Golparvar-Fard, “Machine Learning-Based Risk [91] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-
Analysis for Construction Worker Safety from Ubiquitous Site Photos Time Object Detection with Region Proposal Networks.” arXiv, Jan. 06,
and Videos,” J. Comput. Civ. Eng., vol. 35, Aug. 2021, doi: 2016. doi: 10.48550/arXiv.1506.01497.
10.1061/(ASCE)CP.1943-5487.0000979. [92] “GitHub - ultralytics/yolov5: YOLOv5 in PyTorch > ONNX >
[72] M. Memarzadeh, M. Golparvar-Fard, and J. C. Niebles, “Automated CoreML > TFLite.” Accessed: Apr. 21, 2023. [Online]. Available:
2D detection of construction equipment and workers from site video https://github.com/ultralytics/yolov5
streams using histograms of oriented gradients and colors,” Autom. [93] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in
Constr., vol. 32, Jul. 2013, doi: 10.1016/j.autcon.2012.12.002. 2017 IEEE International Conference on Computer Vision (ICCV), Oct.
[73] S. Tang, M. Golparvar-Fard, M. Naphade, and M. Gopalakrishna, 2017, pp. 2980–2988. doi: 10.1109/ICCV.2017.322.
“Video-Based Motion Trajectory Forecasting Method for Proactive [94] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image
Construction Safety Monitoring Systems,” J. Comput. Civ. Eng., vol. Translation with Conditional Adversarial Networks.” arXiv, Nov. 26,
34, p. 04020041, Nov. 2020, doi: 10.1061/(ASCE)CP.1943- 2018. doi: 10.48550/arXiv.1611.07004.
5487.0000923. [95] “Long Short-Term Memory | Neural Computation | MIT Press.”
[74] M. Zhang, Z. Cao, Z. Yang, and X. Zhao, “Utilizing Computer Vision Accessed: Apr. 24, 2023. [Online]. Available:
and Fuzzy Inference to Evaluate Level of Collision Safety for Workers https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-
and Equipment in a Dynamic Environment,” J. Constr. Eng. Manag., Term-Memory?redirectedFrom=fulltext
vol. 146, p. 04020051, Jun. 2020, doi: 10.1061/(ASCE)CO.1943- [96] M. Yang et al., “Transformer-based deep learning model and video
7862.0001802. dataset for unsafe action identification in construction projects,” Autom.
[75] J. Li, X. Zhao, G. Zhou, M. Zhang, D. Li, and Y. Zhou, “Evaluating Constr., vol. 146, p. 104703, Feb. 2023, doi:
the Work Productivity of Assembling Reinforcement through the 10.1016/j.autcon.2022.104703.
Objects Detected by Deep Learning,” Sensors, vol. 21, p. 5598, Aug. [97] W. Liu, Y. Shao, S. Zhai, Z. Yang, and P. Chen, “Computer Vision-
2021, doi: 10.3390/s21165598. Based Tracking of Workers in Construction Sites Based on MDNet,”
[76] J. Li, G. Zhou, D. Li, M. Zhang, and X. Zhao, “Recognizing workers’ IEICE Trans. Inf. Syst., vol. E106.D, no. 5, pp. 653–661, 2023, doi:
construction activities on a reinforcement processing area through the 10.1587/transinf.2022DLP0045.
position relationship of objects detected by faster R-CNN,” Eng. Constr. [98] L. Huang, Q. Fu, M. He, D. Jiang, and Z. Hao, “Detection algorithm of
Archit. Manag., vol. ahead-of-print, Jan. 2022, doi: 10.1108/ECAM- safety helmet wearing based on deep learning,” Concurr. Comput. Pract.
04-2021-0312. Exp., vol. 33, no. 13, p. e6234, 2021, doi: 10.1002/cpe.6234.
[77] M. Zhang and S. Ge, “Vision and Trajectory–Based Dynamic Collision [99] M. Park, D. Q. Tran, J. Bak, and S. Park, “Small and overlapping
Prewarning Mechanism for Tower Cranes,” J. Constr. Eng. Manag., vol. worker detection at construction sites,” Autom. Constr., vol. 151, p.
148, Jul. 2022, doi: 10.1061/(ASCE)CO.1943-7862.0002309. 104856, Jul. 2023, doi: 10.1016/j.autcon.2023.104856.
[78] J. Li, X. Zhao, G. Zhou, and M. Zhang, “Standardized use inspection [100] K. Han and X. Zeng, “Deep Learning-Based Workers Safety Helmet
of workers’ personal protective equipment based on deep learning,” Saf. Wearing Detection on Construction Sites Using Multi-Scale Features,”
Sci., vol. 150, p. 105689, Jun. 2022, doi: 10.1016/j.ssci.2022.105689. IEEE Access, vol. PP, pp. 1–1, Dec. 2021, doi:
[79] Z. Li and D. Li, “Action recognition of construction workers under 10.1109/ACCESS.2021.3138407.
occlusion,” J. Build. Eng., vol. 45, p. 103352, Oct. 2021, doi: [101] N.-T. Nguyen, D.-Q. Bui, C. Tran, and H. Tran, “Improved detection
10.1016/j.jobe.2021.103352. network model based on YOLOv5 for warning safety in construction
[80] Z. Yang, Y. Yuan, M. Zhang, X. Zhao, Y. Zhang, and B. Tian, “Safety sites,” Int. J. Constr. Manag., pp. 1–11, Feb. 2023, doi:
Distance Identification for Crane Drivers Based on Mask R-CNN,” 10.1080/15623599.2023.2171836.
Sensors, vol. 19, p. 2789, Jun. 2019, doi: 10.3390/s19122789. [102] H. Peng and Z. Zhang, “Helmet Wearing Recognition of Construction
[81] J. Cai and H. Cai, “Robust Hybrid Approach of Vision-Based Tracking Workers Using Convolutional Neural Network,” Wirel. Commun. Mob.
and Radio-Based Identification and Localization for 3D Tracking of Comput., vol. 2022, pp. 1–8, Apr. 2022, doi: 10.1155/2022/4739897.
Multiple Construction Workers,” J. Comput. Civ. Eng., vol. 34, p. [103] J. Jain, R. Parekh, J. Parekh, S. Shah, and P. Kanani, “Helmet
04020021, May 2020, doi: 10.1061/(ASCE)CP.1943-5487.0000901. Detection and License Plate Extraction Using Machine Learning and
[82] J. Cai, X. Li, X. Liang, W. Wei, and S. Li, Construction Worker Computer Vision,” 2023, pp. 258–268. doi: 10.1007/978-3-031-22405-
Ergonomic Assessment via LSTM-Based Multi-Task Learning 8_20.
Framework. 2022, p. 224. doi: 10.1061/9780784483961.023. [104] L. Wang et al., “Investigation into Recognition Algorithm of Helmet
[83] J. Cai, Y. Zhang, L. Yang, H. Cai, and S. Li, “A context-augmented Violation Based on YOLOv5-CBAM-DCN,” IEEE Access, vol. 10, pp.
deep learning approach for worker trajectory prediction on unstructured 1–1, Jan. 2022, doi: 10.1109/ACCESS.2022.3180796.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
[105] S. Yue, Q. Zhang, D. Shao, Y. Fan, and J. Bai, “Safety helmet wearing [124] Y.-J. Lee and M.-W. Park, “3D tracking of multiple onsite workers
status detection based on improved boosted random ferns,” Multimed. based on stereo vision,” Autom. Constr., vol. 98, pp. 146–159, Feb.
Tools Appl., vol. 81, May 2022, doi: 10.1007/s11042-022-12014-y. 2019, doi: 10.1016/j.autcon.2018.11.017.
[106] H.-P. Wan, W.-J. Zhang, H.-B. Ge, Y. Luo, and M. D. Todd, [125] M. Liu, D. Hong, S. Han, and S. Lee, Silhouette-Based On-Site
“Improved Vision-Based Method for Detection of Unauthorized Human Action Recognition in Single-View Video. 2016, p. 959. doi:
Intrusion by Construction Sites Workers,” J. Constr. Eng. Manag., vol. 10.1061/9780784479827.096.
149, no. 7, p. 04023040, Jul. 2023, doi: 10.1061/JCEMD4.COENG- [126] J. Yang, “Enhancing action recognition of construction workers using
13294. data-driven scene parsing,” J. Civ. Eng. Manag., vol. 24, pp. 568–580,
[107] W. Jixiu, N. Cai, W. Chen, H. Wang, and G. Wang, “Automatic Nov. 2018, doi: 10.3846/jcem.2018.6133.
detection of hardhats worn by construction personnel: A deep learning [127] S. Bhokare, L. Goyal, R. Ren, and J. Zhang, “Smart construction
approach and benchmark dataset,” Autom. Constr., vol. 106, p. 102894, scheduling monitoring using YOLOv3-based activity detection and
Oct. 2019, doi: 10.1016/j.autcon.2019.102894. classification,” J. Inf. Technol. Constr., vol. 27, pp. 240–252, Mar. 2022,
[108] J. Yang, Z. Shi, and Z. Wu, “Vision-based action recognition of doi: 10.36680/j.itcon.2022.012.
construction workers using dense trajectories,” Adv. Eng. Inform., vol. [128] J. Cai, Y. Zhang, and H. Cai, “Two-step long short-term memory
30, pp. 327–336, Aug. 2016, doi: 10.1016/j.aei.2016.04.009. method for identifying construction activities through positional and
[109] B. Xiao, H. Xiao, J. Wang, and Y. Chen, “Vision-based method for attentional cues,” Autom. Constr., vol. 106, Jul. 2019, doi:
tracking workers by integrating deep learning instance segmentation in 10.1016/j.autcon.2019.102886.
off-site construction,” Autom. Constr., vol. 136, p. 104148, Apr. 2022, [129] G. Torabi, A. Hammad, and N. Bouguila, “Two-Dimensional and
doi: 10.1016/j.autcon.2022.104148. Three-Dimensional CNN-Based Simultaneous Detection and Activity
[110] Y. Piao, W. Xu, and T.-K. Wang, “Dynamic Fall Risk Assessment Classification of Construction Workers,” J. Comput. Civ. Eng., vol. 36,
Framework for Construction Workers Based on Dynamic Bayesian Jul. 2022, doi: 10.1061/(ASCE)CP.1943-5487.0001024.
Network and Computer Vision,” J. Constr. Eng. Manag., Oct. 2021, doi: [130] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-Person
10.1061/(ASCE)CO.1943-7862.0002200. 2D Pose Estimation using Part Affinity Fields.” arXiv, Apr. 13, 2017.
[111] JackC. P. Cheng, P. K.-Y. Wong, H. Luo, M. Wang, and P. Leung, doi: 10.48550/arXiv.1611.08050.
“Vision-based monitoring of site safety compliance based on worker [131] H.-S. Fang et al., “AlphaPose: Whole-Body Regional Multi-Person
re-identification and personal protective equipment classification,” Pose Estimation and Tracking in Real-Time.” arXiv, Nov. 07, 2022. doi:
Autom. Constr., vol. 139, p. 104312, Jul. 2022, doi: 10.48550/arXiv.2211.03375.
10.1016/j.autcon.2022.104312. [132] M. Liu, S. Han, and S. Lee, “Tracking-based 3D human skeleton
[112] B. Yang, Z. Wang, and B. Liu, “Reidentification-Based Automated extraction from stereo video camera toward an on-site safety and
Matching for 3D Localization of Workers in Construction Sites,” J. ergonomic analysis,” Constr. Innov., vol. 16, pp. 348–367, Jul. 2016,
Comput. Civ. Eng., vol. 35, Nov. 2021, doi: 10.1061/(asce)cp.1943- doi: 10.1108/CI-10-2015-0054.
5487.0000975. [133] J. Seo, R. Starbuck, S. Han, S. Lee, and T. Armstrong, “Motion Data-
[113] H. Deng, Z. Ou, and Y. Deng, “Multi-Angle Fusion-Based Safety Driven Biomechanical Analysis during Construction Tasks on Sites,” J.
Status Analysis of Construction Workers,” Int. J. Environ. Res. Public. Comput. Civ. Eng., vol. 29, p. B4014005, Oct. 2014, doi:
Health, vol. 18, p. 11815, Nov. 2021, doi: 10.3390/ijerph182211815. 10.1061/(ASCE)CP.1943-5487.0000400.
[114] F. Li, Y. Chen, M. Hu, M. Luo, and G. Wang, “Helmet-Wearing [134] J. Seo, A. Alwasel, S. Lee, E. Abdel-Rahman, and C. Haas, “A
Tracking Detection Based on StrongSORT,” Sensors, vol. 23, p. 1682, comparative study of in-field motion capture approaches for body
Feb. 2023, doi: 10.3390/s23031682. kinematics measurement in construction,” Robotica, vol. 37, pp. 1–19,
[115] Y.-R. Lee, S.-H. Jung, K.-S. Kang, H.-C. Ryu, and H.-G. Ryu, “Deep Dec. 2017, doi: 10.1017/S0263574717000571.
learning-based framework for monitoring wearing personal protective [135] W. Chu, S. Han, X. Luo, and Z. Zhu, “Monocular Vision-Based
equipment on construction sites,” J. Comput. Des. Eng., vol. 10, no. 2, Framework for Biomechanical Analysis or Ergonomic Posture
pp. 905–917, Apr. 2023, doi: 10.1093/jcde/qwad019. Assessment in Modular Construction,” J. Comput. Civ. Eng., vol. 34, p.
[116] P. Li, F. Wu, S. Xue, and L. Guo, “Study on the Interaction Behaviors 04020018, Jul. 2020, doi: 10.1061/(ASCE)CP.1943-5487.0000897.
Identification of Construction Workers Based on ST-GCN and YOLO,” [136] J. Kim, K. Daeho, J. Shah, and S. Lee, Synthetic Training Image
SENSORS, vol. 23, no. 14, p. 6318, Jul. 2023, doi: 10.3390/s23146318. Dataset for Vision-Based 3D Pose Estimation of Construction Workers.
[117] I. Jeelani, K. Asadi, H. Ramshankar, K. Han, and A. Albert, “Real- 2022, p. 262. doi: 10.1061/9780784483961.027.
time vision-based worker localization & hazard detection for [137] J. Seo and S. Lee, “Automated postural ergonomic risk assessment
construction,” Autom. Constr., vol. 121, p. 103448, Jan. 2021, doi: using vision-based posture classification,” Autom. Constr., vol. 128, p.
10.1016/j.autcon.2020.103448. 103725, Aug. 2021, doi: 10.1016/j.autcon.2021.103725.
[118] H. Huang, H. Hu, F. Xu, Z. Zhang, and Y. Tao, “Skeleton-based [138] E. Chian, Y. M. Goh, J. Tian, and B. Guo, “Dynamic identification of
automatic assessment and prediction of intrusion risk in construction crane load fall zone: A computer vision approach,” Saf. Sci., vol. 156,
hazardous areas,” Saf. Sci., vol. 164, p. 106150, Aug. 2023, doi: p. 105904, Dec. 2022, doi: 10.1016/j.ssci.2022.105904.
10.1016/j.ssci.2023.106150. [139] E. Chian, Y. M. Goh, and J. Tian, “Management of Safe Distancing
[119] X. Mei, X. Zhou, F. Xu, and Z. Zhang, “Human Intrusion Detection on Construction Sites During COVID-19: A Smart Real-time
in Static Hazardous Areas at Construction Sites: Deep Learning–Based Monitoring System,” Comput. Ind. Eng., vol. 163, p. 107847, Dec.
Method,” J. Constr. Eng. Manag., vol. 149, p. 04022142, Jan. 2023, doi: 2021, doi: 10.1016/j.cie.2021.107847.
10.1061/(ASCE)CO.1943-7862.0002409. [140] Z. Chen, L. Wu, H. He, Z. Jiao, and L. Wu, “Vision-based Skeleton
[120] M. Neuhausen, D. Pawlowski, and M. König, “Comparing Classical Motion Phase to Evaluate Working Behavior: Case Study of Ladder
and Modern Machine Learning Techniques for Monitoring Pedestrian Climbing Safety,” Hum.-Centric Comput. Inf. Sci., vol. 12, Jan. 2022,
Workers in Top-View Construction Site Video Sequences,” Appl. Sci., doi: 10.22967/HCIS.2022.12.001.
vol. 10, p. 8466, Nov. 2020, doi: 10.3390/app10238466. [141] S. Wu, L. Hou, G. Zhang, and H. Chen, “Real-time mixed reality-
[121] M. Neuhausen, P. Herbers, and M. König, “Using Synthetic Data to based visual warning for construction workforce safety,” Autom.
Improve and Evaluate the Tracking Performance of Construction Constr., vol. 139, p. 104252, Jul. 2022, doi:
Workers on Site,” Appl. Sci., vol. 10, p. 4948, Jul. 2020, doi: 10.1016/j.autcon.2022.104252.
10.3390/app10144948. [142] T. Dang, T. Le, T. Hong, and V. Nguyen, Fast and Accurate Fall
[122] L. Yongyue, Z. Zhou, and Y. Wang, A Tracking Method of Multi- Detection and Warning System Using Image Processing Technology.
Workers Onsite with Kalman Filter and OpenPose. 2021, p. 280. doi: 2021, p. 210. doi: 10.1109/ATC52653.2021.9598204.
10.1061/9780784483848.031. [143] P. Hung and N. Su, “Unsafe Construction Behavior Classification
[123] O. Angah and A. Chen, “Tracking multiple construction workers Using Deep Convolutional Neural Network,” Pattern Recognit. Image
through deep learning and the gradient based method with re-matching Anal., vol. 31, pp. 271–284, Apr. 2021, doi:
based on multi-object tracking accuracy,” Autom. Constr., vol. 119, p. 10.1134/S1054661821020073.
103308, Nov. 2020, doi: 10.1016/j.autcon.2020.103308.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
[144] P. Zhai, J. Wang, and L. Zhang, “Extracting Worker Unsafe and Robotics (ecar 2018), Lancaster: Destech Publications, Inc, 2018,
Behaviors from Construction Images Using Image Captioning with pp. 470–476. Accessed: May 07, 2023. [Online]. Available:
Deep Learning–Based Attention Mechanism,” J. Constr. Eng. Manag., https://www.webofscience.com/wos/woscc/full-
vol. 149, Feb. 2023, doi: 10.1061/JCEMD4.COENG-12096. record/WOS:000453087200083
[145] H. Kim, K. Kim, and H. Kim, “Vision-Based Object-Centric Safety [164] A. Kamoona et al., “Random Finite Set-Based Anomaly Detection for
Assessment Using Fuzzy Inference: Monitoring Struck-By Accidents Safety Monitoring in Construction Sites,” IEEE Access, vol. PP, pp. 1–
with Moving Objects,” J. Comput. Civ. Eng., vol. 30, p. 04015075, Dec. 1, Jul. 2019, doi: 10.1109/ACCESS.2019.2932137.
2015, doi: 10.1061/(ASCE)CP.1943-5487.0000562. [165] V. Delhi, S. Lal, and A. Thomas, “Detection of Personal Protective
[146] Y.-S. Shin and J. Kim, “A Vision-Based Collision Monitoring System Equipment (PPE) Compliance on Construction Site Using Computer
for Proximity of Construction Workers to Trucks Enhanced by Posture- Vision Based Deep Learning Techniques,” Front. Built Environ., vol.
Dependent Perception and Truck Bodies’ Occupied Space,” 6, Sep. 2020, doi: 10.3389/fbuil.2020.00136.
Sustainability, vol. 14, p. 7934, Jun. 2022, doi: 10.3390/su14137934. [166] N. Nath, A. Behzadan, and S. Paal, “Deep learning for site safety:
[147] M. Neuhausen, P. Herbers, and M. König, Synthetic Data for Real-time detection of personal protective equipment,” Autom. Constr.,
Evaluating the Visual Tracking of Construction Workers. 2020. doi: vol. 112, p. 103085, Apr. 2020, doi: 10.1016/j.autcon.2020.103085.
10.1061/9780784482865.038. [167] H. Wu and Z. Liao, Design and implementation of safety helmet
[148] Q. Hu et al., “Intelligent Framework for Worker-Machine Safety detection system based on computer vision. 2021. doi:
Assessment,” J. Constr. Eng. Manag., vol. 146, p. 04020045, May 2020, 10.1117/12.2626829.
doi: 10.1061/(ASCE)CO.1943-7862.0001801. [168] A. Hayat and F. Morgado-Dias, “Deep Learning-Based Automatic
[149] B. Yang, B. Zhang, Q. Zhang, Z. Wang, M. Dong, and T. Fang, Safety Helmet Detection System for Construction Safety,” Appl. Sci.,
“Automatic detection of falling hazard from surveillance videos based vol. 12, p. 8268, Aug. 2022, doi: 10.3390/app12168268.
on computer vision and building information modeling,” Struct. [169] Z. Shanti, C.-S. Cho, B. Garcí a de Soto, Y.-J. Byon, C. Yeun, and T.
Infrastruct. Eng., vol. 18, pp. 1–15, Feb. 2022, doi: Kim, “Real-time monitoring of work-at-height safety hazards in
10.1080/15732479.2022.2039217. construction sites using drones and deep learning,” J. Safety Res., vol.
[150] Q. Hu, Y. Bai, L. He, J. Huang, H. Wang, and G. Cheng, “Workers’ 83, Oct. 2022, doi: 10.1016/j.jsr.2022.09.011.
Unsafe Actions When Working at Heights: Detecting from Images,” [170] M. Alateeq, F. P.P., and M. Ali, “Construction Site Hazards
Sustainability, vol. 14, p. 6126, May 2022, doi: 10.3390/su14106126. Identification Using Deep Learning and Computer Vision,”
[151] B. Lee, S. Hong, and H. Kim, “Determination of workers? compliance Sustainability, vol. 15, p. 2358, Jan. 2023, doi: 10.3390/su15032358.
to safety regulations using a spatio-temporal graph convolution [171] M. Ferdous and S. M. M. Ahsan, “PPE detector: a YOLO-based
network,” Adv. Eng. Inform., vol. 56, p. 101942, Apr. 2023, doi: architecture to detect personal protective equipment (PPE) for
10.1016/j.aei.2023.101942. construction sites,” PeerJ Comput. Sci., vol. 8, p. 24, Jun. 2022, doi:
[152] Ministry of Housing and Urban-Rural Development of the People's 10.7717/peerj-cs.999.
Republic of China. Accessed: May 09, 2023. [Online]. Available: [172] L. Zeng, X. Duan, Y. Pan, and M. Deng, “Research on the algorithm
https://www.mohurd.gov.cn/ of helmet-wearing detection based on the optimized yolov4,” Vis.
[153] “Industries at a Glance: Construction: NAICS 23 : U.S. Bureau of Comput., vol. 39, pp. 1–11, May 2022, doi: 10.1007/s00371-022-
Labor Statistics.” Accessed: May 09, 2023. [Online]. Available: 02471-9.
https://www.bls.gov/iag/tgs/iag23.htm [173] M. Nain, S. Sharma, and C. Sandeep, “Authentication control system
[154] S. Konda, H. M. Tiesman, and A. A. Reichard, “Fatal traumatic brain for the efficient detection of hard-hats using deep learning algorithms,”
injuries in the construction industry, 2003−2010,” Am. J. Ind. Med., vol. J. Discrete Math. Sci. Cryptogr., vol. 24, pp. 2291–2306, Nov. 2021,
59, no. 3, pp. 212–220, 2016, doi: 10.1002/ajim.22557. doi: 10.1080/09720529.2021.2011109.
[155] Code of practice for selection of personal protective equipment, GB/T [174] Z. Shanti et al., “A Novel Implementation of an AI-Based Smart
11651-2008. Accessed: May 09, 2023. [Online]. Available: Construction Safety Inspection Protocol in the UAE,” IEEE Access, vol.
https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=0307C61B4C PP, pp. 1–1, Dec. 2021, doi: 10.1109/ACCESS.2021.3135662.
CE89BE4316BCE2A36C57DC [175] Z. Xu, J. Huang, and K. Huang, “A novel computer vision‐based
[156] “1926.100 - Head protection. | Occupational Safety and Health approach for monitoring safety harness use in construction, ” IET
Administration.” Accessed: May 09, 2023. [Online]. Available: Image Process., vol. 17, p. n/a-n/a, Nov. 2022, doi: 10.1049/ipr2.12696.
https://www.osha.gov/laws- [176] H. Liang and S. Seo, “Automatic Detection of Construction Workers’
regs/regulations/standardnumber/1926/1926.100 Helmet Wear Based on Lightweight Deep Learning,” Appl. Sci., vol.
[157] M.-W. Park, N. Elsafty, and Z. Zhu, “Hardhat-Wearing Detection for 12, p. 10369, Oct. 2022, doi: 10.3390/app122010369.
Enhancing On-Site Safety of Construction Workers,” J. Constr. Eng. [177] J. Lee and S. Lee, “Construction Site Safety Management: A
Manag., vol. 141, p. 04015024, Jan. 2015, doi: Computer Vision and Deep Learning Approach,” Sensors, vol. 23, p.
10.1061/(ASCE)CO.1943-7862.0000974. 944, Jan. 2023, doi: 10.3390/s23020944.
[158] B. E. Mneymneh, M. Abbas, and H. Khoury, “Evaluation of computer [178] L. Wu, N. Cai, Z. Liu, A. Yuan, and H. Wang, “A one-stage deep
vision techniques for automated hardhat detection in indoor learning framework for automatic detection of safety harnesses in high-
construction safety applications,” Front. Eng. Manag., vol. 5, no. 2, pp. altitude operations,” Signal Image Video Process., vol. 17, Apr. 2022,
227–239, Jun. 2018, doi: 10.15302/J-FEM-2018071. doi: 10.1007/s11760-022-02205-3.
[159] B. E. Mneymneh, M. Abbas, and H. Khoury, “Automated Hardhat [179] W.-C. Chern, J. Hyeon, T. Nguyen, V. K. Asari, and H. Kim,
Detection for Construction Safety Applications,” Procedia Eng., vol. “Context-aware safety assessment system for far-field monitoring,”
196, pp. 895–902, Jan. 2017, doi: 10.1016/j.proeng.2017.08.022. Autom. Constr., vol. 149, p. 104779, May 2023, doi:
[160] B. E. Mneymneh, M. Abbas, and H. Khoury, “Vision-Based 10.1016/j.autcon.2023.104779.
Framework for Intelligent Monitoring of Hardhat Wearing on [180] G. Yan, Q. Sun, J. Huang, and Y. Chen, “Helmet Detection Based on
Construction Sites,” J. Comput. Civ. Eng., vol. 33, Mar. 2019, doi: Deep Learning and Random Forest on UAV for Power Construction
10.1061/(ASCE)CP.1943-5487.0000813. Safety,” J. Adv. Comput. Intell. Intell. Inform., vol. 25, pp. 40–49, Jan.
[161] H. Seong, H. Son, and C. Kim, “A Comparative Study of Machine 2021, doi: 10.20965/jaciii.2021.p0040.
Learning Classification for Color-based Safety Vest Detection on [181] S. Chen and K. Demachi, “Towards on-site hazards identification of
Construction-Site Images,” KSCE J. Civ. Eng., vol. 22, Sep. 2018, doi: improper use of personal protective equipment using deep learning-
10.1007/s12205-017-1730-3. based geometric relationships and hierarchical scene graph,” Autom.
[162] Y. Gu, S. Xu, Y. Wang, and L. Shi, An Advanced Deep Learning Constr., vol. 125, p. 103619, May 2021, doi:
Approach for Safety Helmet Wearing Detection. 2019, p. 674. doi: 10.1016/j.autcon.2021.103619.
10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00128. [182] G. Iannizzotto, L. Lo Bello, and G. Patti, “Personal Protection
[163] J. Fu, Y. Chen, and S. Chen, “Design and Implementation of Vision Equipment detection system for embedded devices based on DNN and
Based Safety Detection Algorithm for Personnel in Construction Site,” Fuzzy Logic,” Expert Syst. Appl., vol. 184, p. 115447, Jun. 2021, doi:
in 2018 International Conference on Electrical, Control, Automation 10.1016/j.eswa.2021.115447.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773
[183] Y. Gu, Y. Wang, L. Shi, N. Li, L. Zhuang, and S. Xu, “Automatic [186] “The Benefits of AI In Construction.” Accessed: Nov. 04, 2023.
detection of safety helmet wearing based on head region location,” IET [Online]. Available: https://constructible.trimble.com/construction-
Image Process., vol. 15, pp. 2441–2453, Sep. 2021, doi: industry/the-benefits-of-ai-in-construction
10.1049/ipr2.12231. [187] H. Luo, J. Liu, W. Fang, Q. Yu, and Z. Lu, “Real-time smart video
[184] S. Chen, K. Demachi, and F. Dong, “Graph-based linguistic and surveillance to manage safety: A case study of a transport mega-project,”
visual information integration for on-site occupational hazards Adv. Eng. Inform., vol. 45, p. 101100, Aug. 2020, doi:
identification,” Autom. Constr., vol. 137, p. 104191, May 2022, doi: 10.1016/j.aei.2020.101100.
10.1016/j.autcon.2022.104191. [188] M. Aljanabi, M. Yaseen, A. Ali, S. Abed, and Chatgpt, “ChatGpt:
[185] Y. Li, H. Wei, Z. Han, J. Huang, and W.-D. Wang, “Deep Learning- Open Possibilities,” vol. 4, Jan. 2023, doi:
Based Safety Helmet Detection in Engineering Management Based on 10.52866/ijcsm.2023.01.01.0018.
Convolutional Neural Networks,” Adv. Civ. Eng., vol. 2020, pp. 1–10, [189] “Your AI-Powered Copilot for the Web.” Accessed: May 19, 2023.
Sep. 2020, doi: 10.1155/2020/9703560. [Online]. Available: https://www.microsoft.com/en-
us/bing?form=MA13FJ
JIAQI LI was born in Anshan, Liaoning, China in LIXIAO ZHANG was born in Hebei, China in 1993.
1991. He received the B.S. degree in civil She received the B.S. degree in civil engineering
engineering from Liaoning Technical University, from Northwest A&F University in 2016, and in
and M.S. degree in structural engineering from 2022 she received Ph.D. degree in Structural
Shenyang Jianzhu University. In 2022, he received Engineering from Dalian University of Technology.
Ph.D. degree in structural engineering from Dalian Since 2022, she has been a lecturer in College of
University of Technology. Transportation Engineering, Dalian Maritime
Since 2022, he has been a lecturer in School of University. Her research interests include bridge
Civil Engineering, University of Science and structure health monitoring and structural damage
Technology Liaoning. His research interests visualization and diagnosis.
include artificial intelligence-based construction
monitoring, deep learning-based building structure damage detection, and ZHAOBO LI was born in Inner Mongolia, China in
earthquake prevention for building structures. He is currently in charge of 1989. He received the B.S. degree in engineering
three research projects and has published eight papers as well as one patent management from Hulunbuir University in 2014. In
to date. 2017, he received the M.S. degree in structural
engineering from Shenyang Jianzhu University.
Qi MIAO was born in Anyang, Henan, China in He is currently pursuing the Ph.D. degree in
2000. She received the B.S. degree in civil structural engineering at China University of Mining
engineering from Luoyang Institute of Science and and Technology. Meanwhile he is a senior engineer
Technology in 2022. of Hohhot Science and Technology Innovation
She is currently pursuing the M.S. degree in civil Service Center. His research interest is Intelligent inspection of building
engineering at University of Science and curtain wall. He has published more than 10 academic papers in related
Technology Liaoning. Her research interest is fields; obtained more than 70 national patents, including 15 invention
artificial intelligence-based construction safety patents, and more than 20 national computer software copyrights.
monitoring.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/