0% found this document useful (0 votes)
23 views22 pages

A Review of Computer Vision-Based Monitoring Appro

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views22 pages

A Review of Computer Vision-Based Monitoring Appro

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2022.Doi Number

A Review of Computer Vision-Based Monitoring


Approaches for Construction Workers’ Work-
Related Behaviors
Jiaqi Li1, Qi Miao1, Zheng Zou2, Huaguo Gao1,Lixiao Zhang2, Zhaobo Li3, 4, and Nan Wang1,
1
School of Civil Engineering, University of Science and Technology Liaoning, Anshan 114051, China
2
College of Transportation Engineering, Dalian Maritime University, Dalian 116026, China
3
Hohhot Science and Technology Innovation Service Center, Hohhot 010011, China
4
School of Mechanics and Civil Engineering, China University of Mining and Technology, Xuzhou 221116, China

Corresponding author: Jiaqi Li (e-mail: [email protected]).


This work was supported in part by the Outstanding Young Scientist Program of Science and Technology Liaoning under Grant 2023YQ03, and in part by
the University of Science and Technology Liaoning Talent Project Grants.

ABSTRACT Construction workers’ behaviors directly affects labor productivity and their own safety,
thereby influencing project quality. Recognizing and monitoring the construction-related behaviors is
therefore crucial for high-quality management and orderly construction site operation. Recent strides in
computer vision technology suggest its potential to replace traditional manual supervision approaches. This
paper explores research on monitoring construction workers’ behaviors using computer vision. Through
bibliometrics and content-based analysis, the authors present the latest research in this area from three
perspectives: "Detection, Localization, and Tracking for Construction Workers," "Recognition of Workers’
Construction Activities," and "Occupational Health and Safety Behavior Monitoring." In terms of the
literature’s volume, there has been a notable increase in this field. Notably, the focus on safety-related
literature is predominant, underscoring the concern for occupational health. Vision algorithms have witnessed
an increase in the utilization of object detection. The ongoing and future research trajectory is anticipated to
involve multi-algorithm integration and an emphasis on enhancing robustness. Then the authors summarize
the review from engineering impact and technical suitability, and analyze the limitations of current research
from the perspectives of technical approaches and application scenarios. Finally, it discusses future research
directions in this field together with generative AI models. Furthermore, the authors hope this paper can
serves as a valuable reference for both scholars and engineers.

INDEX TERMS Computer vision, Construction worker, Construction behavior, Construction site,
Monitoring

I. INTRODUCTION confronted by construction laborers, as indicated by research


The construction industry is recognized as a labor-intensive [4]. Furthermore, several countries have recognized the need
field that involves multi-tasking operations. On-site to strengthen construction supervision regulations at the
workforce management can be challenging due to its governmental policy level[5], [6].
cumbersome nature, which has the potential to result in low Traditionally, professionals conduct on-site inspections and
productivity and high safety risks[1]. Possible inactive supervise and record work, which often leads to problems such
construction activities, inefficiency, and irresponsible as insufficient coverage, unreasonable personnel scheduling,
attitudes of operators have been found to result not only in time-consuming tasks, and subjective supervision results[7].
time and resource wastage but also in economic loss for the In recent years, many countries have embraced the concept of
entire project, along with a decline in safety and intelligent construction, driven by the development of
construction quality[2], [3]. information technology and the Internet of Things[8]-[10].
Lately, there has been a growing emphasis on proactively Thanks to technology and policy, several intelligent
evaluating the occupational health and safety challenges technologies, such as sensor technology, audio technology,

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

and computer vision technology, are being employed in constantly updated and iterative, so a comprehensive review is
engineering construction supervision. needed to summarize and review the latest related literatures
The primary types of sensors used for monitoring in the field in order to understand the current research trends
construction workers are position sensors[11] and acceleration and status, challenges and limitations. At the same time, we
sensors[12]. However, using these sensors requires them to be hope that this paper can give some reference to engineers to
attached to the worker, which can affect their comfort and help innovation in construction management.
increase equipment maintenance costs [13]. The audio-based To achieve this objective, the remainder of the paper is
approach cannot identify construction tasks that produce less organized as follows: Chapter 2 outlines the methodology
sound or no sound at all[14]. In the field of civil engineering, employed for literature search, Chapter 3 describes the
computer vision technology has become a high-precision non- literature situation, Chapter 4 presents the computer vision
contact monitoring method, which is used in matters such as techniques utilized in reviewed research, Chapter 5 provides
cracks monitoring[15]-[17], strains measurement[18], [19], applications scenarios of computer vision techniques to
damage recognition[20], [21]. Vision-based approaches can worker construction-related behaviors, Chapter 6 lists
also record worker activities in real-time, allowing project summaries from the technical and practical application
managers to track construction site productivity, progress, and perspectives, and finally, research gaps and limitations are
safety risks, and make timely decisions about project progress. identified, and potential future research directions are
Using computer vision to achieve intelligent control of smart suggested.
construction sites is not only the next frontier science, but also
the driving force for the development of smart construction II. RESEARCH METHODOLOGY
sites[22]. Figure 1 is the technical framework of this article. Initially, the
Nonetheless, there have been fewer studies conducted to authors establish the study's background in the introduction,
monitor worker behavior than those assessing building providing the groundwork for subsequent research. Following
structure damages, as indicated by Mostafa and Hegazy [23]. this, a meticulous screening and analysis were conducted on
In recent years, several comprehensive reviews have been 137 publications published after 2013, all of which are
conducted, focusing on safety concerns in engineering pertinent to the work behaviors of construction workers.
construction. Zhang et al. [24] and Fang et al. [25] have Subsequently, based on the research content within these
explored the application of computer vision technology in articles, the examination and advancement of knowledge are
worker safety. From an economic development perspective, executed from three perspectives: the localization and tracking
Luo et al. [26]have elaborated on construction safety issues, of construction workers, the recognition of construction
emphasizing emerging trends like deep learning and activities, and the monitoring of occupational health and safety
interdisciplinary technologies. Additionally, Zeng et al.'s work behaviors. These three dimensions comprehensively
has pointed artificial intelligence as directions in construction encompass the critical issues of workers' movement
safety research[27] Construction workers’ behavior plays a trajectories, construction activities, and safety practices, all of
pivotal role in influencing construction quality[2]. Their which exert a direct or indirect influence on project quality
trajectories, work behaviors, and safety practices need to be control and the safety of construction personnel. Building
seamlessly integrated into intelligent management upon the insights gleaned from the literature review, a
systems[24], [28]. In recent years, the number of related works comprehensive synthesis and discussion are presented,
using computer vision technology to conduct research from elucidating the current state of research and delineating
the perspective of construction workers' behavior has been prospective avenues for future investigation.
increasing year by year, and the algorithmic techniques are

FIGURE 1. The research logic framework.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

The authors conducted a search of relevant literature using introduction of the Faster R-CNN algorithm in 2015. Figure 3
Web of Science (WOS). The search string used was: (worker displays the sources of the papers, including Automation in
AND construction) AND ("computer vision" OR "vision- Construction, Journal of Computing in Civil Engineering,
based" OR "deep learning" OR "image processing" OR vision). Journal of Construction Engineering and Management,
The retrieval process comprised three sequential phases. In the Advanced Engineering Informatics, IEEE Access, etc. These
initial stage, utilizing the aforementioned string, 596 scholarly journals are reputable and may have a considerable potential
articles from journal publications and conference papers were to make further contributions to the field.
obtained from the WOS core collection. In the second stage,
review papers were excluded, and literature from 2013 to June
2023 was retained, leaving 496 papers. The third stage
involved a careful manual search of the abstract, keywords,
and body content to exclude content unrelated to "construction
workers" and "machine vision," leaving 137 papers for
analysis.
Figure 2 illustrates the distribution of the 137 articles
analyzed in this study by their year of publication and the
number of citations they received in each year. It is evident
that the number of publications in this field has grown rapidly
since 2015. Moreover, the number of citations reveals that the
research outcomes have received significant attention from the
academic community. This trend could be attributed to the
growing interest in intelligent construction technology and the FIGURE 2. The publication frequency across various years.
advancements in computer vision technology since 2010. The
popularity of computer vision was further enhanced with the

FIGURE 3. The volume of scholarly articles published in relevant academic journals.

III. OVERVIEW OF REVIEWED LITERATURES analysis of the paper's keywords, which led to the creation of
VOSViewer, a bibliometric software, was utilized to cluster a heat map (depicted in Figure 4). In the process of generating
the reviewed papers and identify current topics of interest in this heat map, each point on the map is assigned a color based
the field[29]. The authors utilized VOSViewer for an in-depth on the density of the elements in its vicinity. A higher density

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

corresponds to a closer proximity to red, while lower density central themes in the literature revolve around keywords such
leans toward blue. The magnitude of this density is contingent as "deep learning," "computer vision," "construction," and
upon the significance of these elements, providing a rapid "workers," among others.
overview of critical research areas[27]. It is evident that the

FIGURE 4. Keywords hotspot heatmap.

Subsequently, the authors analyzed the literature's authors Xuefeng[74]–[76], [78], [80], and Cai Jiannan[81]-[84] have
and their institutions, and the results are visualized in Figure 5 shown potential in this field.
and Figure 6. In the figures, a larger font size and nodes Figure 7 illustrates the research progress in each country or
correspond to a higher volume of published literature. Various region, highlighting Mainland China's current leadership in
colors are used to denote distinct clusters, while the lines this field. This may be attributed to the rapid growth of China's
connecting the circles illustrate the interconnectedness construction industry since the 21st century, surpassing the
between authors and their affiliations through cross- United States in terms of construction output by 2012, with the
referencing[30].The filtering condition for this analysis was a gap in output widening over the following decade. However,
minimum of three publications, and the size of the bubbles in this growth has also brought about negative consequences,
the figures indicates that Hong Kong Polytechnic including safety and environmental concerns, leading China to
University[31]-[51] and Huazhong University of Science and prioritize enhanced regulation, smart construction, and green
Technology[32]-[34], [36], [38], [42], [50]-[58] are currently construction. As a result, this concerted effort has catalyzed
the swift advancement and implementation of computer vision
leading the field. Other institutions conducting research in this
technology in the construction industry. The United States, as
area include Chung Ang University[59]-[65], University of
the world's number one economy, has also made significant
Illinois[66]-[73], Dalian University of Technology[74]-[79],
contributions to this field. With its technology powerhouse
and more. Figure 5 highlights that Li Heng's team [31]-[36], status, the United States provides fertile ground for the
[38], [39], [41]-[51] at Hong Kong Polytechnic University and development of artificial intelligence technology. Furthermore,
Luo Hanbin's team[33], [42], [52]–[58] at Huazhong South Korea, Australia, and other countries have also shown
University of Science and Technology are currently leading significant development in the "computer vision + worker
the field, with similar colors representing the same research construction behavior + construction" field.
area. Additionally, Li Jiaqi[75], [76], [78], Zhao

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

FIGURE 5. Analysis of research authors.

FIGURE 6. Analysis of research institutions.

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

FIGURE 7.Analysis of research countries or areas.

IV. TECHNOLOGIES such as Mask R-CNN [93] are used for image segmentation
into different parts or regions. Generative adversarial networks
A. OVERVIEW OF COMPUTER VISION TECHNIQUES (GANs) [94] are used for image generation and restoration, as
Computer Vision, also known as Machine Vision, is a they can produce high-quality images and enhance image
technology that uses mathematical algorithms and computers restoration. Furthermore, long and short term memory
to automate and simulate human vision. Its primary purpose is networks (LSTM) [95] can be used for visual tasks such as
to extract valuable information from image and video data and video classification and object tracking.
then analyze, process, and understand it. Computer vision technologies mentioned above provide
Traditional computer vision methods rely mainly on image powerful support for quickly extracting information from
processing and pattern recognition techniques. These methods construction sites and identifying construction site conditions
include feature extraction techniques like edge detection [85]. and worker behavior.
Image classification methods such as support vector machines
(SVM) [86] can be used to classify objects based on their B. COMPUTER VISION ALGORITHMS’ INNOVATION
features. Object tracking methods like Kalman filter [87] can The construction environment is characterized by its intricate
track the position and motion state of objects in consecutive and dynamic nature, where diverse categories of heavy
frames. Additionally, segmentation methods based on machinery coexist and operate in parallel, alongside
thresholding, edges, regions etc. [88], can divide images into considerable personnel mobility. To better adapt computer
different regions or parts. While these traditional methods can vision technology to engineering needs, researchers have
be useful for tasks such as image recognition, their application improved and innovated upon existing research results in the
in complex scenarios is limited due to the need for manually field of computer vision, updating the technical framework.
designed features. For example, the recognition of steel bar-related activities
Currently, deep learning-based computer vision methods performed by construction workers was addressed by Luo et
have become mainstream. Convolutional neural networks al. [52] through the proposition of a three-stream CNN which
(CNN) and their derivatives, such as ResNet[89], are widely can capture static spatial features, short-term motion, and
used for various tasks, including image classification, long-term motion in video clips. Yang et al. [96] proposed the
detection, and segmentation. Recurrent neural networks (RNN) Spatial Temporal Relation Transformer (STR-Transformer),
[90] are used for sequential data processing, such as video, which better fuses temporal and spatial features in
speech, and text. Object detection networks such as Faster R- construction video clips. They also created a video clip dataset
CNN [91] and YOLO [92] are used to locate and measure the including seven types of construction workers' behaviors.
size of target objects in images, while segmentation networks Fang et al. [56] proposed an improved Faster R-CNN that

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

achieves better multi-scale detection in detecting and locating the features of various algorithms, such as Mask R-CNN and
workers and equipment on construction sites. Liu et al.[97] Kalman Filter, for tracking multiple construction workers,
proposed Multi-Domain Convolutional Neural Network to even in the face of challenges like occlusion and feature scale
improve the tracking performance of construction workers. variation. Piao et al. [110] proposed a framework for computer
Huang et al. [98] achieved better construction worker hardhat vision methods that incorporates Dynamic Bayesian Network,
wear detection performance by improving YOLO v3. Park et Openpose, and Faster R-CNN to assess the fall risk of
al[99] introduced DIoU and NMS for YOLO v5, utilizing construction workers during dynamic construction. Roberts et
weight triplet attention, expansion deature-level, and soft-pool al. [66] developed a computer vision-based method for
to improve the performance and enhance the ability of construction worker activity analysis, which included YOLO
detecting workers at construction sites, especially the ability v3, Alphapose, and i3D based on CNN. Their findings
to detect overlapping objects in complex environments was indicated that incorporating pose estimation enhances
enhanced. Other studies have also involved various computer construction activity analysis. Fang et al. [38] integrated Faster
vision methods for algorithm improvements [100]-[106]. R-CNN for detecting construction activities, SORT for
Establishing a highly stable vision algorithm model is partly tracking individuals, and face recognition for confirming the
dependent on high-quality training data. While some open- identity of workers, to eventually identify whether there are
source databases like COCO and ImageNet are available, they uncertified workers engaged in irrelevant construction tasks.
have limited applications in the construction industry. To Li et al. [78] used YOLO v5 to detect Personal Protective
address this limitation, Yang et al. [96] created a video clip Equipment (PPE) and Openpose to detect skeleton joints,
dataset comprising seven types of construction workers' incorporating the visual feature information to determine
behavior. Wu et al. [107] also developed an open-access whether the PPE was correctly used, using 1D-CNN. Ding et
dataset called GDUT-HWD, which includes 3174 images for al. [58] combined CNN and LSTM to identify construction
hardhat detection. Additionally, several studies have workers' unsafe behaviors. Cai et al. [84] used Faster R-CNN
developed new databases, including Yang et al. [108], Luo et to detect the head and body orientation of construction
al. [52], Tian et al. [44], and Xiong et al. [48]. These efforts workers, and subsequently applied a multi-task learning
have resulted in an increased availability of training data for network to assess the visual attention direction of construction
construction-related computer vision tasks. workers. Fang et al. [51] combined Mask R-CNN and
Section 4.1 emphasizes that different classes of vision Cascaded Pyramid Network to localize construction workers
algorithms can fulfill distinct functions, and for construction in monocular vision, then identified their unsafe behaviors.
sites with complex scenes, a single class of algorithm may not Table 1 presents other papers that integrated multiple
suffice for behavior recognition. Thus, combining multiple algorithms to achieve improved visual recognition.
algorithms is necessary. For instance, Xiao et al. [109]
proposed a construction worker tracking method that utilizes
TABLE I
TYPICAL RESEARCH FOR ALGORITHMS INTEGRATED

Authors Objectives Algorithms

Tracking of construction workers and monitoring the


Cheng et al[111] Person ReID + CNN
PPE’s usage
Identify a person’s identity from videos captured from spatial attention network + temporal attention
Wei et al[54]
construction sites networks
SiamMask + improved Social-LSTM +
Ting et al[33] Predict construction workers’ unsafe behaviors
PNPoly
Assessing the risk of collisions between construction Faster R-CNN + 3D bounding box
Yan et al[35]
workers and trucks reconstruction
Faster R-CNN + DeepSORT + ReID +
Yang et al[112] 3D Localization of Workers in Construction Sites
Triangulation
YOLO v3 + SORT + 3D CNN + Clustering
Luo et al[41] Recognize construction workers’ activities
filting
Deng et al[113] Detect workers and predict their movement trajectories SURF + improved GMM + HOG + SVM
Li et al[114] Hardhat-wearing tracking detection YOLO v5 + Strong SORT
YOLACT employs MobineNetv3 +
Lee et al[115] PPE usage detection
DeepSORT
YOLO + ST-GCN (Spatial temporal graph
Li et al[116] Recognize construction workers’ activities
convolutional networks)

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

localization for individual workers on a temporal scale


V. APPLICATIONS requires combining various algorithms [112], [117]. For
This section provides an in-depth analysis of the reviewed instance, Yang et al. [112] added DeepSORT after Faster R-
articles from the application perspective. The literature CNN for multi-object detection and then used the Person ReID
materials selected in this paper consider all construction- algorithm for individual worker localization on the time scale.
related behaviors, including workers' movement trajectory, Some researchers have extended worker detection
construction activities, and safety behaviors. These behaviors applications. For example, Huang et al[118] and Mei et al[119]
have indirect or direct impacts on project quality control and used skeleton-based and object detection-based approaches,
workers' safety. The section analyzes the literature from three respectively, to monitor whether a worker has intruded into a
perspectives: construction workers' location and tracking, prohibited area based on localization. Wei et al. [54] integrated
workers' construction activities recognition, and monitoring of spatial attention networks and temporal attention networks to
occupational health and safety behaviors. recognize individual identities. Cai et al. [84] used Faster R-
Figure 8 illustrates the distribution of literature among CNN to detect the head and body orientation of construction
various application domains, with the majority of the research workers, and then applied a multitask learning network to
dedicated to occupational health and safety behavior evaluate the visual attention direction of workers. Yan et al.
monitoring, constituting over 70% of the total literature. This [39] first detected workers using Faster R-CNN and then
finding suggests that safety in construction is a significant applied 3D pose estimation to locate them in 3D space.
concern among researchers worldwide. Within this domain, The fundamental basis for the implementation of worker
researchers show a higher inclination towards detecting tracking lies in the localization and detection of construction
Personal Protective Equipment (PPE) use, which constitutes personnel, facilitating an enhanced comprehension of labor
32.8% of the total reviewed literature. allocation and mobility within construction sites. The Kalman
Filtering algorithm has found wide-ranging applications in the
A. DETECTION, LOCALIZATION AND TRACKING FOR field of object tracking, as evidenced by the reviewed literature
CONSTRUCTION WORKERS [109], [113], [120]-[122]. It is common to combine Kalman
Engineering construction involves multi-tasking operations Filtering with object detection algorithms. For example,
with a high turnover of personnel, and the distribution of the Neuhausen et al. [120], [121] used YOLO v3 to detect
location and the trajectory of construction workers affect both construction workers and then tracked them using the Kalman
productivity and efficiency. Thus, detecting and tracking filter, demonstrating the robustness of the Kalman filter in
construction workers is a crucial task in construction compensating for the limitations of YOLO v3. Liu et al. [122]
management. proposed a different approach by combining Kalman Filtering
with the individual pose joint point estimation algorithm,
Openpose, to reduce the manual annotation workload. Similar
results can be achieved by combining object detection
algorithms with other techniques. Son et al. [63] proposed the
combination of YOLO v4 and Siamese Network for
construction worker detection and tracking, achieving an
accuracy of 0.975. Wan et al[106] improved YOLO v5 by
attention mechanism, astrous spatial pooling, and universal
upsampling to track construction workers by classifying
hardhats, which in turn identified unauthorized intrusions.
Angah et al. [123] integrated Mask R-CNN, Matching, and
Rematching, enabling multiple construction worker tracking
on the screen. Liu et al[97] introduced a multidomain
representation for CNNs to improve tracking effects in
FIGURE 8. Analysis of research countries or areas. complex dynamic scenes. Furthermore, researchers have
expanded tracking into 3D space based on 2D tracking. Cai et
When detecting construction workers, relying on a single al. [81] proposed a hybrid approach of visual tracking and
algorithm, such as SVM, is often considered in early stage. For radio localization to address the problem of easily losing
instance, Memarzadeh et al. [72] extracted HOG features and targets in visual recognition. Lee et al. [124] employed two
then used SVM to detect construction workers and equipment. cameras for separate tracking and localization, followed by
On the other hand, Fang et al. [56] and Son et al. [62] both Entity Matching for 3D tracking.
used the Faster R-CNN algorithm for object detection and LSTM networks can analyze input data with a time series
achieved better performance. Park et al[99] proposed SOC- component. To predict the action trajectories of construction
YOLO to improve the detection of construction workers in workers, Cai et al. [83] proposed a context-aware LSTM-
complex scenes when they overlap. Achieving more accurate based method. Similarly, Tang et al. [73] combined LSTM

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

with a mixture density network to achieve the prediction of amalgamation of object detection algorithms with
construction workers' action trajectories. complementary methodologies. For instance, they combined
Computer vision-based detection and tracking methods YOLO v3 with SORT and 3D CNN to achieve spatial
have the potential to effectively assist project managers in localization and construction activities recognition of
analyzing the precise location information of construction construction workers [41]. In another study, the researchers
workers. As computer vision technology continues to advance, utilized YOLO v3, SORT, KCF, C3D, and CRF algorithms in
the accuracy and stability of tracking methods will gradually multiple steps, enabling the recognition of construction
improve, leading to enhanced tracking capabilities of activities performed by workers in groups [50]. Considering
individuals in 3D space and improved prediction of action the spatio-temporal properties of construction activities, Li et
trajectories. al. [76] directly employed the Faster R-CNN algorithm to
recognize construction activities in images, simultaneously
B. RECOGNITION OF WORKERS’ CONSTRUCTION detecting construction workers and determining the activities
ACTIVITIES performed by different operators based on the spatial
The accurate recognition of workers' construction activities relationship between individuals and activities. Bhokare et al.
plays a crucial role in enabling project managers to effectively [127] also proposed a method based on YOLO v3 for direct
monitor construction progress and optimize the allocation of recognition of construction activities.
labor resources. It serves as a valuable tool for analyzing labor To effectively integrate construction activities with spatio-
productivity. Previous studies primarily employed traditional temporal information, utilizing video clips for recognition is a
image processing algorithms for construction activities more promising approach. Previous studies have proposed the
recognition. For instance, Liu et al. [125] utilized a silhouette- use of multistream convolutional neural networks [45], [52].
based approach, while Yang et al. [108] employed an SVM In recent years, research methods based on videos have
approach for classifying image features. However, with become more diverse. Roberts et al. [66] incorporated pose
technological advancements, Yang et al. [126] enhanced the estimation algorithms into video-based construction activities
method by incorporating data-driven scene parsing while recognition. Similarly, Cai et al. [128] introduced an attention
retaining the dense trajectories. direction estimation method to identify groups of construction
In the last several years, the advancement of convolutional workers and subsequently classified their activities using
neural networks has led to reduced training costs for object LSTM. Li et al[116] identified three activities of construction
detection algorithms, as well as improved detection speed and workers through YOLO and ST-GCN, which are throwing,
accuracy. These advancements have facilitated the application operating and crossing. Torabi et al. [129] and Yang et al. [96]
of object detection algorithms in construction activities employed YOWO53 and Transformer, respectively,
recognition. Luo et al. [43] employed Faster R-CNN for the demonstrating high recognition accuracy on today's advanced
identification of construction workers and various entities computing hardware. Li and Li [79] applied Openpose and
depicted in images captured at construction sites. They GAN to estimate the complete skeleton joints of construction
employed a relevance network to recognize multiple workers under occlusion, then used ResNet to recognize
construction activities by analyzing the spatial relationship construction activities.
between them. Similar studies include Fang et al. [38], who Table 2 presents the comprehensive details of the reviewed
employed SORT and face recognition for the detection of non- literature in this section. The literature encompasses a broad
certified work, and Li et al. [75], who used CenterNet to detect spectrum of application scenarios, extending beyond the
construction workers and objects to evaluate construction physical activities of construction workers to encompass
productivity while recognizing reinforcement assembly various processes within civil engineering construction. These
activities. technologies hold significant potential for facilitating
Extensive investigations in the domain of construction intelligent construction and enhancing project management.
activity recognition have been conducted by the research
group led by Luo Xiaochun and Li Heng, delving into the

TABLE II
CONSTRUCTION ACTIVITIES RECOGNITION RESEARCH DETAILS

Authors Objectives Algorithms

Workers’ construction activities recognition (LayBrick,


Yang et al (2016)[108] Transporting, CutPlate, Drilling, TieRebar, Nailing, SVM
Plastering, Shoveling, Bolting, Welding, Sawing)
a silhouette-based human action
Liu et al (2016)[125] Construction workers’ actions (walking, lifting, crawling)
recognition method
Workers’ construction activities recognition (Steel bending,
Luo et al (2018)[52] An improved three-stream CNN
Transporting, Walking)

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

Workers’ construction activities recognition (LayBrick,


dense trajectories + data-driven scene
Yang (2018)[126] CutPlate, Drilling, TieRebar, Nailing, Plastering, Shoveling,
parsing
Bolting, Welding, Sawing)
Recognize workers’ construction activities (Making
framework, Installing windows, cutting rebar, mixing Faster R-CNN + relationship between
Fang et al (2018)[38]
concrete, pouring concrete, operating excavator et al.) and detected boxes + SORT + face recognition
detect non-certified work
Recognize workers’ construction activities (Placing
Faster R-CNN + relationship between
Luo et al (2018)[43] concrete, machining or transferring formwork, building
detected boxes + relevance network
formwork, transporting goods, et al.)
Recognize workers’ construction activities (measuring,
Luo et al (2018)[45] preparing, fixing formwork, connecting rebar, fixing rebar, Two-stream CNN
transporting rebar, welding rebar, et al.)
Recognize workers’ construction activities (standing, YOLO v3 + SORT + 3D CNN +
Luo et al (2019)[41]
bending, squatting, transporting, pulling, leveling, et al.) Clustering algorithm
Recognize workers’ construction activities (standing and
Luo et al (2019)[49] carrying, leveling, moving, leaning forward, leaning Bayesian Nonparametric Learning
backward, sitting, standing with hands empty)
Recognize working groups and recognize group
Cai et al (2019)[128] Positional cues + attentional cues + LSTM
construction activities (Spotting, road paving, et al.)
Roberts et al Construction worker activity analysis (bricklaying and YOLO v3 + Alphapose + i3D based on
(2020)[66] plastering operations) CNN
Faster R-CNN + detected boxes + human-
Tang et al (2020)[67] Safety inspection and construction activities recognition
object interaction
Recognize workers’ construction activities (Masonry work, Sentence generation (CNN + RNN + word
Liu et al (2020)[40]
cart transportation, rebar work, plastering, tiling) embedding)
Recognize workers’ construction activities in groups
Luo et al (2020)[50] (Moving with hands empty, sitting and resting, transporting, YOLO v3 + SORT +KCF + C3D + CRF
placing concrete, cutting formwork, placing rebar, et al.)
Recognize workers’ construction activity (assembling CenterNet + detected boxes + productivity
Li et al (2021)[75]
reinforcement) and evaluate the work productivity evaluation
Recognize workers’ construction activities on a
Faster R-CNN + spatial relationship
Li et al (2022)[76] reinforcement processing area (Straighten, check,
between detected boxes
transferring, process)
Recognize workers’ construction activities (Placing/fixing
Torabi et al
rebars, drilling, hammering, standing, walking, transporting, YOWO53
(2022)[129]
et al)
Bhokare et al Recognize workers’ construction activities (Brick laying,
YOLO v3
(2022)[127] Carpentry, Excavating, Concrete Screeding)
Recognize workers’ construction activities (Smoke, stand,
Yang et al (2023)[96] STR-Transformer
climb ladder, wear helmet, fall down, walk, talk, et al)
Recognize construction activities of workers under
occlusion (Driving truck, Transporting cement, Checking
Li and Li (2021) [79] Openpose + GAN + ResNet50
the power socket, Cleaning up the plank, Lashing rebar,
Paving concrete, Installing scaffolding, Smearing plaster)
Recognize workers behaviors (throwing, operating,
Li et al[116] YOLO + ST-GCN
crossing)

Earlier studies relied on 3D depth cameras as vision


C. OCCUPATIONAL HEALTH AND SAFETY BEHAVIOR algorithms were not as advanced. Han et al. [68] employed a
MONITORING FOR CONSTRUCTION WORKERS depth camera to acquire data and reconstruct the 3D
1) POSE ESTIMATION coordinates of joint points belonging to construction workers.
The analysis of construction workers' body posture enables This approach facilitated the detection of perilous actions
the understanding of their fatigue level, safety condition, and performed during the process of ascending ladders.
construction behavior. While algorithms like Openpose[130] Khosrowpour et al. [69] also employed a depth camera to
and Alphapose[131] offer rapid and accurate estimation of acquire 3D joint point data and performed activity
2D skeleton joint points, it is important to note that 2D pose classification. Liu et al. [132] utilized a stereo video camera to
estimation may not fully capture the nuanced movements of capture 3D skeleton joint points. In an early stage approach,
construction workers. Consequently, researchers in the Seo et al. [133] transformed motion capture data obtained
engineering field have placed greater emphasis on pose from vision-based methods for biomechanical analysis. In
estimation in 3D space. another study, Seo et al. [37] employed shape- and radial
histogram-based features to estimate the posture of

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

construction workers. Additionally, Seo et al. [134] compared study, Yu et al. [47] curated a construction posture dataset
multiple pose estimation methods and concluded that the encompassing various postures to facilitate 3D posture
sensor-based approach yielded the least error. estimation. Chu et al. [135] developed an ergonomic 3D
In past few years, the rise of deep learning methods has posture assessment framework for construction workers,
facilitated direct vision-based pose estimation. Zhang et al. [46] which integrated tracking algorithms, 2D detection, and 3D
employed a multi-stage convolutional neural network to body generation. Kim et al. [136] constructed a synthetic
estimate the 3D pose of construction workers using monocular dataset to enhance 3D pose estimation, while Tian et al. [44]
vision. Yu et al. [36] utilized a vision-based approach to obtain compiled a construction worker motion dataset that can
the 3D spatial pose of construction workers and further contribute to 3D pose recognition. Table 3 provides a list of
collected ergonomic information through an intelligent insole, typical studies in pose estimation.
enabling the analysis of their physical condition. In another
TABLE III
TYPICAL RESEARCH FOR POSE ESTIMATION

Authors Objectives Algorithms 2D/3D

Khosrowpour et Pose estimation and activity 3D pose estimation + activity


3D
al(2014)[69] classification classifier + Hidden Markov Model
multi-stage convolutional neural
Zhang et al (2018)[46] Pose estimation 3D
network
3D human body detection + REBA
Yu et al (2019)[31] Pose estimation 3D
(Rapid Entire Body Assessment)
Pose estimation + Physical 3D motion capture algorithm +
Yu et al (2019)[34] 3D
fatigue assessment biomechanical analysis
A 3D pose dataset consists of
Dataset creation + Pose
Yu et al (2020)[47] diversified working poses + residual 3D
estimation
artificial neural network
Shape-based features + radial
Seo et al (2021)[137] Posture classification 2D
histograms of silhouettes + SVM
Pose estimation + LSTM +
Cai et al (2022)[82] Pose estimation 2D
multitask learning

monitor safe distances between construction workers. Ladder


2) UNSAFE BEHAVIOR RECOGNITION climbing poses a potential fall risk, which has been addressed
During construction tasks, it is crucial to monitor the actions by Han et al. [70], Anjum et al. [59], Ding et al. [58], and Chen
of construction workers as they may engage in hazardous et al. [140] using various computer vision methodologies.
movements or enter dangerous areas due to various factors. Furthermore, several computer vision-based studies have
Such behaviors pose significant safety risks and require contributed to the recognition of unsafe behaviors,
vigilant monitoring. encompassing safety risk determination[71], [141], fall
Computer vision techniques facilitated in extracting detection[142], and identification of unsafe actions[48], [65],
postural features related to unsafe behaviors of construction [143], [144].
workers and assessing safety by analyzing their spatial Construction sites often involve the simultaneous operation
location in relation to other entities. A research team from of various large machinery and random stacking of materials,
Huazhong University of Science and Technology developed a increasing the risk of collisions between construction workers
safety monitoring framework during the Wuhan Metro and these objects. The application of computer vision methods
construction project, which effectively detects and predicts for assessing collision risks between workers and machinery
unsafe behaviors, including instances of construction workers or materials plays a vital role in ensuring the safety of
illegally walking on structural supports [32], [33], [53], [57]. construction workers. In earlier studies, Kim et al. [145]
The risk of falls is a major concern for construction workers, employed Gaussian mixture models to identify collision risks
and Piao et al. [110] successfully employed Dynamic between construction workers and mechanical equipment.
Bayesian Network, Openpose, and Faster R-CNN to detect fall Fang et al. [51] utilized Mask R-CNN and Cascaded Pyramid
risks step by step. During hoisting operations, workers face the Network to locate the relationships between construction
risk of object strikes in dangerous areas. Chian et al. [138] workers, equipment, and materials using monocular vision.
utilized CenterNet to detect and track construction workers Zhang et al. [74] applied Faster R-CNN to detect workers and
entering these hazardous zones. During the Covid-19 machines at construction sites, obtaining their location
pandemic, Chian et al. [139] also employed CenterNet to coordinates, and determining collision risks through fuzzy

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

inference. Yan et al. [35] employed the Faster R-CNN collision risks between construction workers and tower cranes.
algorithm to detect key points of construction trucks, followed Additionally, other studies have contributed to collision risk
by 3D reconstruction to establish distance recognition in the identification [64], [147], [148]and fall risk identification
3D scene. Shin et al. [146] integrated YOLO v3 and the [149]. Table 4 presents some of the typical research literature
Openpose algorithm to identify collision risks between in this field.
construction workers and trucks. Zhang and Ge[77] employed
the latest Transformer technology to identify dynamic

TABLE IV
TYPICAL RESEARCH FOR UNSAFE BEHAVIOR RECOGNITION

Authors Objectives Algorithms

Construction workers’ unsafe behavior recognition


Fang et al (2019)[53] Mask RCNN + Overlapping detection module
(walking through structural supports)
Predict construction workers’ unsafe behaviors SiamMask + improved Social-LSTM +
Ting et al (2021)[33]
(walking through structural supports) PNPoly
Construction workers’ unsafe behavior recognition
Mei et al (2023)[119] YOLO v5
(enter dangerous area)
Construction worker unsafe behavior recognition
Ding et al[58] CNN + LSTM
(during ladder climbing)
Construction worker unsafe behaviors when working at
Hu et al[150] height(lying, relying, jumping, throwing, with no Faster R-CNN
helmet)
Evaluate risk of collision between construction workers
Kim et al[145] GMM + Kalman filtering + Fuzzy Inference
and machinery
Evaluate risk of collision between construction workers
Zhang et al (2020)[74] Faster R-CNN + Fuzzy Inference
and machinery
Evaluate risk of collision between construction workers Faster RCNN + 3D bounding box
Yan et al (2020)[35]
and trucks reconstruction
Evaluate risk of collision between construction workers YOLO v3 + Openose + Perception-Based
Shin et al (2022)[146]
and trucks Safety Ellipse
Zhang and Ge Evaluate risk of collision between construction workers
FairMOT + Transformer
(2022)[77] and crane
Check whether the behavior complies with safety
regulations (Self-checking for PPE, Responding to the Skeleton-based method + spatio-temporal
Lee et al[151]
warning alarm, Checking the joint, Step board graph convolutional network
inspection, and Hooking the safety hook)

3) PPE’S USAGE INSPECTION instance, Park et al. [157] utilized image processing methods
Safety accidents are prevalent in engineering construction, to extract the spatial relationship between construction
with a significant number of casualties resulting from fall workers and hardhats. They subsequently employed SVM to
accidents and object strikes, as reported by the Ministry of detect and match this feature, determining whether
Urban and Rural Construction of the People's Republic of construction workers were wearing hardhats or not.
China and the U.S. Bureau of Labor Statistics [152]-[154]. To Mneymneh et al. [158]-[160] extracted various image features
mitigate the occurrence of such casualties, various countries such as SURF and employed template matching and cascade
have implemented mandatory policies and regulations classifiers for hardhat detection. In order to achieve better
regarding personal protective equipment (PPE). For instance, performance in safety vest detection, Seong et al. [161]
the State Administration of Quality Supervision, Inspection compared various features and classifier algorithms. The
and Quarantine of China and the U.S. Occupational Safety and results demonstrated that combining the C4.5 classifier with
Health Administration have specified the use of hardhats YCbCr and SVM classifier yielded superior outcomes. Similar
during construction operations [155], [156], and the to previous application areas, deep learning methods are
requirement for safety harnesses when working at heights. rapidly replacing traditional techniques. Fang et al. [55]
With the advancements in artificial intelligence technology, employed Faster R-CNN to detect the usage of safety
numerous computer vision-based studies have emerged to harnesses by construction workers at heights. In another study
address the detection of PPE usage. conducted by the same group, Fang et al. [42] employed Faster
Before the rapid advancement of deep learning techniques, R-CNN for far-field monitoring at construction sites to detect
traditional methods for processing image features and machine hardhat wearing. The utilization of the Faster R-CNN
learning algorithms were more commonly employed. For algorithm has significantly enhanced image recognition

VOLUME XX, 2017 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

accuracy [162]. Fu et al. [163] employed this algorithm for joints. They employed Scene graph representation to describe
hardhat wearing detection, while Kamoona et al. [164] applied the spatial relationship between joints and PPE, enabling the
it to detect high-visibility vests. detection of multiple PPE items such as hardhats, dust masks,
As deep learning algorithms continue to advance, Delhi et safety glasses, and safety belts. Li et al. [78] introduced the
al. [165] applied YOLO v3 for detecting the usage of hardhats hierarchy of control and emphasized the importance of
and jackets among construction workers. YOLO v3 is an ensuring correct usage of PPE from an administrative control
algorithm that has made significant strides in the field of object perspective. By combining the features extracted by YOLO v5
detection, leading to extensive research employing or and Openpose, they employed 1D-CNN for classification and
enhancing this algorithm to detect the personal protective detection of incorrect usage of hardhats and safety harnesses.
equipment (PPE) of construction workers [98], [102], [166], Table 5 provides further details on some typical studies.
[167]. Among them, Nath et al. [166] developed three distinct
models based on the YOLO v3 algorithm. In their second D. SUMMARY
model, the algorithm simultaneously detected individual This section presents an overview of computer vision
workers and verified PPE compliance using a single CNN applications for recognizing construction worker behavior on
framework, achieving a mean average precision (mAP) of construction sites, primarily focusing on occupational health
72.3%. Additionally, apart from v3, the method has been and safety risks. These articles represent significant
continuously updated to incorporate algorithms such as advancements in utilizing computer vision technology to
YOLO v4 and v5, which exhibit even better performance in enhance both productivity and safety in engineering
hardhat detection[100], [101], [104], [168]-[173]. Nguyen et construction. Compared to other sensing and monitoring
al. [101] created a dataset comprising 11,978 images and technologies, computer vision technology offers immense
tested various versions of YOLO v5. Experimental results potential for development, cost-effectiveness, and a wide
indicated that the enhanced YOLO v5s demonstrated the best range of applications. According to survey results, CNNs
detection performance, achieving a precision of 0.74 on their emerge as the most widely employed technique among
custom dataset. researchers, leveraging their ability to rapidly extract features
To enhance detection accuracy, researchers have explored compared to traditional machine learning algorithms. With the
various algorithms [61], [105], [174]-[179] and integrated continuous evolution of CNNs, object detection technology
multiple algorithms to address the PPE monitoring has been extensively adopted, demonstrating strong
challenge[60], [78], [111], [114], [115], [180]-[184]. Some adaptability in monitoring worker construction behavior. It
scholars have also developed open-source PPE usage datasets has achieved breakthroughs in detection performance and real-
for the broader academic community [107]. Xu et al. [175] time monitoring, enabling the tracking of construction
proposed a novel detection strategy called the matching- workers' location, activities, and safety risks. As object
recheck strategy and evaluated safety harness detection using detection technology continues to advance and Transformer
the newly introduced Efficient YOLO v5 on a custom dataset, technology gains prominence, computer vision technology
with 94% mAP. Chen and Demachi [184] utilized YOLO v3 holds the promise of making even greater contributions to the
and Openpose to extract features related to PPE and individual field's development.

TABLE V
TYPICAL RESEARCH FOR PPE’S USAGE INSPECTION

Authors Objectives Algorithms Performance

using geometric and spatial


the precision for issuing the
Park et al (2015)[157] Monitoring hardhats’ usage relationship between individuals
safety alerts reached 94.3%
and hardhats + SVM
Fang et al (2018)[42] Detecting non-hardhat use Faster R-CNN > 90%
Kamoona et al Detect high-visibility vests’ Faster R-CNN + Anomaly
98% F1 score
(2019)[164] usage Detection
Detect construction workers’
Nath et al (2020)[166] YOLO v3 72.3% mAP
PPE’s usage (hardhat, vest)
Detect construction workers’
Nguyen et al PPE’s usage (hardhats, 0.74 Precision; 0.66 Recall;
Improved YOLO v5
(2023)[101] masks, googles, gloves, suit, 0.70 F1 Score
shoes)
Wang et al 91.6 % accuracy; 29 FPS
Monitoring hardhats’ usage Improved YOLO v5
(2022)[104] detection speed
Li et al (2020)[185] Monitoring hardhats’ usage SSD-MobileNet 95% precision; 77% recall
Yue et al (2022)[105] Monitoring hardhats’ usage improved boosted random ferns 92.74% accuracy
Gu et al (2021)[183] Monitoring hardhats’ usage YOLO v4 + Openpose 95.1% Precision

VOLUME XX, 2017 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

Cheng et al Detect PPE’s usage (hardhat,


Person ReID + PPE detection 87.33% Accuracy
(2022)[111] vest)
Iannizzotto et al Detect PPE’s usage (helmet,
YOLO-based DNN + Fuzzy Logic >85%
(2021)[182] mask, vest)
PPE’s standardized usage
Li et al (2022)[78] inspection (hardhat, safety YOLO v5 + Openpose + 1D-CNN 0.9467 Accuracy
harness)
PPE’s proper use detection NLP + YOLO v4 + graph 98.54% Precision; 88.79%
Chen et al (2022)[184]
(hardhat, glasses, dust mask) representation Recall
Li et al (2023)[114] PPE’s usage detection YOLO v5 +Strong SORT 95.4% mAP
Detect PPE’s usage (strap, 78.5% Accuracy / 86.22%
Chern et al[179] Semantic / Depth estimation model
hardhat, harness, hook) Accuracy

Huazhong University of Science and Technology in the


VI. DISCUSSION Wuhan metro project[53], [56]-[58]. The established quality
This article presents the latest research outcomes concerning and safety management platform comprehensively monitors
the utilization of computer vision techniques for the on-site construction conditions, provides real-time updates on
recognition of construction workers' behavior. Considering quality and safety risks, and guides dynamic adjustments to
prevailing trends and the global emphasis on the high-quality the construction process[187] (refer to Figure 9). Additionally,
development of infrastructure, we anticipate that this research Li et al.[75], [76] and Yang et al.'s outcomes[80] were
avenue will continue to garner considerable enthusiasm in the integrated into a super high-rise construction project in the
ensuing years. Within this section, we introduce the impact of Donggang Business District of Dalian City in collaboration
relevant technologies on construction, undertaking a with China Construction Eighth Engineering Division. This
comparative analysis of the appropriateness of various integration led to a reduction in project managers' workload,
algorithmic approaches. Subsequently, we summarize the indirectly contributing to a zero-accident occurrence
current research gaps and delve into possible future research throughout the project. While some results cited in this paper
directions. Notably, we incorporate insights from generative may not explicitly showcase the integration of applications in
AI models, a highly prevalent trend this year, providing an real projects, the data collected originates from actual
additional perspective and offering recommendations for construction sites and holds the potential for further
researchers engaged in related pursuits. development into tangible products[43], [84], [99].
Technology companies like Baidu and HikVision have
A. IMPACT ON ENGINEERING AND CONSTRUCTION launched commercial solutions for smart construction sites,
Artificial intelligence (AI) has spearheaded the evolution of aligning with the computer vision trend. In contrast to
intelligent construction, enhancing the efficiency of university scholars' emphasis on methodological innovation,
construction operations. Some practitioners assert that the commercial solutions prioritize functionality. In the current
application of AI technology can yield cost savings of up to stage, the swift progress of AI technology has influenced
20% in building construction[186]. Over recent years, with the public decision-making[8]-[10], and the concept of smart sites
continual refinement of algorithms and methods, a growing is expected to persist as a focal point in the engineering and
number of outcomes are finding practical applications in construction field. Anticipating increased investment in
engineering. China, boasting the largest economy in the research and development, it is foreseeable that novel
construction industry, has witnessed a growth in engineering methodologies will expedite technological integration with
applications. Noteworthy instances include the successful commercial solutions, further enriching the body of
implementation of results from Luo Hanbin's team at knowledge.

FIGURE 9. Wuhan metro monitoring system[187].

VOLUME XX, 2017 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

C. RESEARCH GAPS AND POSSIBLE FUTURE


B. TECHNICAL SUITABILITY RESEARCH DIRECTIONS
One of the challenges hindering the widespread adoption of 1) TECHNICAL APPROACHES
computer vision technology in the engineering construction Considering the profound influence of visual algorithm
field is the method's applicability. The construction site performance on the precision of behavior recognition,
environment is intricate, exhibiting significant variations from enhancing the accuracy and speed of visual algorithms
one project to another. Factors such as lighting conditions and remains a pivotal trajectory for future development in this
occlusion pose substantial impacts on the algorithm's domain. In the realm of CNN-based methods, accuracy
recognition outcomes. Consequently, the algorithm's enhancements can be achieved by modifying the network
robustness becomes a critical consideration. structure, while computational speed can be augmented
In the realm of computer vision, earlier algorithms through alterations in convolution techniques or the
predominantly relied on image processing and machine implementation of alternative methods, such as refining the
learning techniques, exemplified by methods like HOG loss function.
features [72] and support vector machine (SVM) [108], [157]. While CNN-based methods excel in local feature extraction,
However, these algorithms suffer from sluggish image feature they often exhibit robustness issues to data noise and
extraction, inability to handle extensive data training, and deformation. Presently, Transformer technology, grounded in
exhibit low robustness and limited engineering applicability. a self-attention mechanism, demonstrates superior modeling
CNN, featuring weight sharing and automated feature ability and enhanced robustness. Its application has begun to
extraction, exhibits the capacity to train effectively on permeate the field of engineering and construction.
extensive datasets. However, when the dataset primarily Nevertheless, the Transformer model's intricate structure and
represents a singular scene composition, its applicability to numerous parameters pose challenges. Hence, a plausible
other projects may be compromised [35], [50], [80]. In such future direction involves developing a lightweight
instances, remedying this limitation necessitates secondary Transformer model tailored for the construction domain.
training. Conversely, if the dataset encompasses diverse Potential measures encompass the incorporation of
lighting conditions, CNNs demonstrate efficacy in addressing lightweight attention modules, utilization of convolution or
such challenges. Notably, methods utilizing Faster R-CNN pooling operations to reduce sequence length, among other
have yielded satisfactory results in scenarios involving varied strategies.
lighting conditions [42], [76]. The emergence of larger-scale The integration of various computer vision algorithms can
datasets related to construction behavior further substantiates effectively tackle intricate behavior recognition tasks,
the success of algorithms like Faster R-CNN and YOLO [44], constituting a methodologically sound approach. However, to
[48], [52], [107], [108]. In terms of methodology, there are align with the practical demands of engineering applications,
also ways to address applicability issues such as occlusion [79]. future research could explore the establishment of a unified
As algorithms mature, the focus on improving robustness specification for operational environments and normalized
becomes pertinent. For instance, Park et al [99] enhanced data formats applicable to diverse computer vision algorithms.
YOLO v5 with multiple strategies, enabling the detection of Alternatively, the development of novel algorithms or
construction obscured and overlapped workers even in low monitoring strategies geared towards multifunctional real-
light environments (Figure 10). Anticipating continued time monitoring could be pursued.
algorithmic and methodological advancements, coupled with 2) APPLICATION SCENARIOS
ongoing dataset updates, the resolve of addressing the From the reviewed articles, the predominant focus has been on
technique's robustness appears promising. addressing occupational health and safety risks within the
construction context. There is a noticeable dearth of studies
pertaining to the tracking of construction workers and the
recognition of construction activities. While understanding the
location distribution and movement trajectory of workers
proves valuable in visualizing the labor force's distribution
status and aiding resource allocation, vision-based tracking
encounters challenges in preventing the loss of targets across
spatio-temporal scales. This predicament has led to the
increased attention given to ReID (Re-Identification)
algorithms in recent years. Future research directions could
involve the development and exploration of ReID algorithms
tailored for construction workers or the pursuit of alternative
methods with comparable functionality.
Construction activities recognition can intuitively reflect
FIGURE 10. Detection results from SOC-YOLO model [99]
the productivity status of the site, and should be given enough

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

attention in future research, which can establish a large-scale e. Standard benchmarks and evaluation metrics will be
construction activities database for the development of established for comparing different computer vision-based
computer vision-based approaches. progress monitoring methods and systems.
Engineering construction sites present intricate scenarios, f. Integration of multi-sensor data, such as wearable sensors
with environmental variations observed across different or GPS, with visual data for a more comprehensive
locations. Some research is carried out in laboratory settings understanding of workers' behavior.
for image acquisition and testing, while others collect data g. Development of worker-centered monitoring approaches
from specific engineering projects. Consequently, the while respecting privacy and autonomy.
performance and effectiveness of these approaches cannot be h. Exploration of robust monitoring systems that can adapt
assured when applied to different sites. To illustrate, to changing environments, including lighting conditions and
considering Personal Protective Equipment (PPE) usage weather.
detection, numerous studies report accuracy rates surpassing i. Implementation of real-time monitoring and intervention
90%. However, these achievements are often based on self- systems to provide immediate feedback to workers and
constructed datasets for testing rather than utilizing open- prevent potential hazards.
source public datasets. Moreover, generative AI models offer substantial potential
Future research endeavors should focus on constructing benefits. Excellent design capabilities are exemplified by
extensive public datasets encompassing diverse facets of Cadence's Allegro X AI technology. This generative AI model
worker construction behavior. This comprehensive dataset facilitates the optimization of the system design process.
should span static images and dynamic videos, encompassing Furthermore, the generative AI models‘ capacity holds
a spectrum of construction scenarios across various project promise for creating algorithms to recognize the construction
types. It is strongly advised that researchers employ these workers’ behaviors. When this technology matures, the AI
public datasets during the development of algorithmic model can seamlessly execute automated testing of product
frameworks. Doing so would significantly contribute to the features, including compatibility testing for software
broader comprehension of algorithms among researchers, applications. Additionally, in the realm of construction
fostering collaboration and knowledge exchange. management, where decision-making is integral, generative
3) GENERATIVE AI’S RESPONSE AI models can prove invaluable by providing targeted advice.
In 2023, generative AI has emerged as a prominent field
within AI, with ChatGPT[188] and NewBing[189] standing VII. CONCLUSIONS
out as two highly popular generative AI models. ChatGPT, This paper presents a comprehensive review of the literature
developed by OpenAI, is a conversational AI model built upon pertaining to the study of monitoring construction workers'
the GPT 3.5 series models, which have been trained using behavior using computer vision technology. Initially, the
reinforcement learning techniques. NewBing, on the other researchers conducted a keyword search on the WOS
hand, is an interactive tool offered by Microsoft that platform, resulting in 596 papers from the past decade.
incorporates ChatGPT technology to provide more innovative Through manual inspection, 137 papers were selected for
insights and suggestions. further analysis. The distribution of publication years
To gather Q&A information related to generative AI, Dr. indicates that the field is undergoing rapid development, with
Yang Zhaozheng from the University of Strasbourg adopted ongoing updates to technical methods. The authors
both NewBing and ChatGPT, directing specific inquiries to employed VOSViewer software to identify research hotspots
each model. It should be noted that the statements presented and highlight notable researchers and institutions in the field.
by Generative AI models do not reflect the perspectives of Dr. The paper provides an overview of computer vision
Yang or any of the contributors to this article. techniques and technological advancements in worker
Here are some suggestions for future research directions behavior recognition. The reviewed literature is categorized
given by generative AI models: into three sections based on application scenarios: "
a. Enhance object detection and recognition algorithms for Detection, localization and tracking for construction
construction workers in complex indoor environments. workers," " Recognition of workers’ construction activities,"
b. Improvement is sought in accuracy and robustness and " Occupational health and safety behavior monitoring."
despite variations in lighting conditions, occlusions, camera The abundance of literature demonstrates the preference of
movements, and background noises. current researchers for employing computer vision to address
c. Combining computer vision with technologies like deep safety concerns in engineering construction. The analysis
learning, IoT, augmented reality, and digital twin for better reveals a trend wherein vision-based approaches begin to be
performance in progress monitoring systems. adopted in real-world engineering, impacting decision-
d. Progress monitoring will not only cover physical aspects making processes in civil engineering. Initially, across
but also psychological, social, and environmental factors different application scenarios, early research inclined
affecting workers' well-being and behavior. towards addressing challenges through conventional image
processing methods and machine-learning classification.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

However, as Convolutional Neural Networks (CNNs) and [9] “Construction 2025: strategy,” GOV.UK. Accessed: Apr. 20, 2023.
[Online]. Available:
target detection techniques matured, vision methods https://www.gov.uk/government/publications/construction-2025-
exhibited enhanced robustness, adept at handling behavior strategy
recognition issues even in intricate scenarios, leading to the [10] Notice on the issuance of the "14th Five-Year Plan" for the
development of the construction industry. Accessed: Apr. 20, 2023.
swift displacement of traditional methods. Recent years have
[Online]. Available: http://www.gov.cn/zhengce/zhengceku/2022-
witnessed an increase in the integration of multiple 01/27/content_5670687.htm
algorithms to address real-world challenges, marking a [11] A. Montaser and O. Moselhi, “RFID indoor location identification for
prevalent trend. Additionally, the emergence of transformer construction projects,” Autom. Constr., vol. 39, pp. 167–179, Apr. 2014,
doi: 10.1016/j.autcon.2013.06.012.
technology holds the promise of transcending existing [12] N. Nath, R. Akhavian, and A. Behzadan, “Ergonomic analysis of
algorithmic limitations in this domain. construction worker’s body postures using wearable mobile sensors,”
The authors conclude by summarizing research gaps and Appl. Ergon., vol. 62, pp. 107–117, Jul. 2017, doi:
10.1016/j.apergo.2017.02.007.
outlining future directions. Further optimization is required [13] L. Joshua and K. Varghese, “Accelerometer-Based Activity
in terms of accuracy and speed, especially when integrating Recognition in Construction,” J. Comput. Civ. Eng., vol. 25, pp. 370–
multiple algorithms, necessitating the establishment of a 379, Sep. 2011, doi: 10.1061/(ASCE)CP.1943-5487.0000097.
[14] Y. Lee, M. Scarpiniti, and A. Uncini, “Advanced Sound Classifiers and
unified framework platform and integration strategy. In Performance Analyses for Accurate Audio-Based Construction Project
terms of application scenarios, improvements can be made to Monitoring,” J. Comput. Civ. Eng., vol. 34, p. 04020030, Sep. 2020,
enhance scene applicability and diversify data sources. doi: 10.1061/(ASCE)CP.1943-5487.0000911.
Furthermore, insights from the generative AI model [15] Y. Zhang, Y.-Q. Ni, X. Jia, and Y.-W. Wang, “Identification of
concrete surface damage based on probabilistic deep learning of
underscore the importance of considering the privacy and images,” Autom. Constr., vol. 156, p. 105141, Dec. 2023, doi:
autonomy of construction workers, suggesting their 10.1016/j.autcon.2023.105141.
involvement in monitoring projects. This paper serves as a [16] Y. Tang et al., “Novel visual crack width measurement based on
backbone double-scale features for improved detection automation,”
reference for engineers and researchers, aiming to contribute Eng. Struct., vol. 274, p. 115158, Jan. 2023, doi:
to the field of engineering and construction while benefiting 10.1016/j.engstruct.2022.115158.
fellow scholars' scientific studies. [17] Z. Wu, Y. Tang, B. Hong, B. Liang, and Y. Liu, “Enhanced Precision
in Dam Crack Width Measurement: Leveraging Advanced Lightweight
Network Identification for Pixel-Level Accuracy,” Int. J. Intell. Syst.,
ACKNOWLEDGMENT vol. 2023, p. e9940881, Sep. 2023, doi: 10.1155/2023/9940881.
Thanks are due to Dr. Yang Zhaozheng from University of [18] X. Botao, L. Zhang, M. Ding, W. Li, and X. Zhao, “Strain measurement
based on cooperative operation with different smartphones,” Comput.-
Strasbourg for assistance with the Generative AI models. Aided Civ. Infrastruct. Eng., vol. 38, Sep. 2022, doi:
10.1111/mice.12919.
REFERENCES [19] Y. Tang et al., “Seismic performance evaluation of recycled aggregate
[1] Y. Zhang, T. Wang, and K.-V. Yuen, “Construction site information concrete-filled steel tubular columns with field strain detected via a
decentralized management using blockchain and smart contracts,” novel mark-free vision method,” Structures, vol. 37, pp. 426–441, Mar.
Comput.-Aided Civ. Infrastruct. Eng., vol. 37, no. 11, pp. 1450–1467, 2022, doi: 10.1016/j.istruc.2021.12.055.
2022, doi: 10.1111/mice.12804. [20] Y. Zhang, X. Zhao, and P. Liu, “Multi-Point Displacement Monitoring
[2] M. Yihua and X. Tuo, “Research of 4M1E’s effect on engineering Based on Full Convolutional Neural Network and Smartphone,” IEEE
quality based on structural equation model,” Syst. Eng. Procedia, vol. Access, vol. 7, pp. 139628–139634, 2019, doi:
1, pp. 213–220, Jan. 2011, doi: 10.1016/j.sepro.2011.08.034. 10.1109/ACCESS.2019.2943599.
[3] H. Zhou, Y. Zhao, Q. Shen, L. Yang, and H. Cai, “Risk assessment and [21] Y. Zhang and K.-V. Yuen, “Bolt damage identification based on
management via multi-source information fusion for undersea tunnel orientation-aware center point estimation network,” Struct. Health
construction,” Autom. Constr., vol. 111, p. 103050, Mar. 2020, doi: Monit., vol. 21, p. 147592172110042, Mar. 2021, doi:
10.1016/j.autcon.2019.103050. 10.1177/14759217211004243.
[4] C. Sheehan, R. Donohue, T. Shea, B. Cooper, and H. De Cieri, [22] W. Ma, “Technical framework of energy-saving construction
“Leading and lagging indicators of occupational health and safety: The management of intelligent building based on computer vision
moderating role of safety leadership,” Accid. Anal. Prev., vol. 92, pp. algorithm,” Soft Comput., May 2023, doi: 10.1007/s00500-023-08424-
130–138, Jul. 2016, doi: 10.1016/j.aap.2016.03.018. 1.
[5] Opinions on promoting the sustainable and healthy development of the [23] K. Mostafa and T. Hegazy, “Review of image-based analysis and
construction industry--General Office of the State Council. Accessed: applications in construction,” Autom. Constr., vol. 122, p. 103516, Feb.
Apr. 20, 2023. [Online]. Available: 2021, doi: 10.1016/j.autcon.2020.103516.
http://www.gov.cn/zhengce/zhengceku/2017- [24] M. Zhang, R. Shi, and Z. Yang, “A critical review of vision-based
02/24/content_5170625.htm occupational health and safety monitoring of construction site workers,”
[6] “Bringing Innovation to the Worksite with‘ Smart Construction’ / The Saf. Sci., vol. 126, p. 104658, Jun. 2020, doi:
Government of Japan - JapanGov -,” The Government of Japan - 10.1016/j.ssci.2020.104658.
JapanGov -. Accessed: Apr. 20, 2023. [Online]. Available: [25] W. Fang, P. E. D. Love, H. Luo, and L. Ding, “Computer vision for
https://www.japan.go.jp/tomodachi/2017/autumn2017/power_of_inno behaviour-based safety in construction: A review and future directions,”
vation.html Adv. Eng. Inform., vol. 43, p. 100980, Jan. 2020, doi:
[7] T. Dawood, Z. Zhu, and T. Zayed, “Computer Vision–Based Model for 10.1016/j.aei.2019.100980.
Moisture Marks Detection and Recognition in Subway Networks,” J. [26] F. Luo, R. Y. M. Li, M. J. C. Crabbe, and R. Pu, “Economic
Comput. Civ. Eng., vol. 32, no. 2, p. 04017079, Mar. 2018, doi: development and construction safety research: A bibliometrics
10.1061/(ASCE)CP.1943-5487.0000728. approach,” Saf. Sci., vol. 145, p. 105519, Jan. 2022, doi:
[8] L. Xu, E. Xu, and L. Li, “Industry 4.0: State of the art and future trends,” 10.1016/j.ssci.2021.105519.
Int. J. Prod. Res., vol. 56, pp. 1–22, Mar. 2018, doi: [27] L. Zeng and R. Y. M. Li, “Construction safety and health hazard
10.1080/00207543.2018.1444806. awareness in Web of Science and Weibo between 1991 and 2021,” Saf.
Sci., vol. 152, p. 105790, Aug. 2022, doi: 10.1016/j.ssci.2022.105790.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

[28] D. Fang, Y. Huang, H. Guo, and H. W. Lim, “LCB approach for [47] Y. Yu, H. Li, J. Cao, and X. Luo, “Three-Dimensional Working Pose
construction safety,” Saf. Sci., vol. 128, p. 104761, Aug. 2020, doi: Estimation in Industrial Scenarios With Monocular Camera,” IEEE
10.1016/j.ssci.2020.104761. Internet Things J., vol. PP, pp. 1–1, Aug. 2020, doi:
[29] U. A. Bukar, M. S. Sayeed, S. F. A. Razak, S. Yogarayan, O. A. Amodu, 10.1109/JIOT.2020.3014930.
and R. A. R. Mahmood, “A method for analyzing text using [48] R. Xiong, Y. Song, H. Li, and Y. Wang, “Onsite Video Mining for
VOSviewer,” Methods X, vol. 11, Dec. 2023, doi: Construction Hazards Identification with Visual Relationships,” Adv.
10.1016/j.mex.2023.102339. Eng. Inform., vol. 42, Jul. 2019, doi: 10.1016/j.aei.2019.100966.
[30] N. J. van Eck and L. Waltman, “Software survey: VOSviewer, a [49] X. Luo, H. Li, X. Yang, Y. Yu, and D. Cao, “ Capturing and
computer program for bibliometric mapping,” Scientometrics, vol. 84, Understanding Workers’ Activities in Far‐Field Surveillance Videos
no. 2, pp. 523–538, Aug. 2010, doi: 10.1007/s11192-009-0146-3. with Deep Action Recognition and Bayesian Nonparametric Learning,”
[31] Y. Yu, X. Yang, H. Li, X. Luo, H. Guo, and Q. Fang, “Joint-Level Comput.-Aided Civ. Infrastruct. Eng., vol. 34, Oct. 2018, doi:
Vision-Based Ergonomic Assessment Tool for Construction Workers,” 10.1111/mice.12419.
J. Constr. Eng. Manag., vol. 145, May 2019, doi: [50] X. Luo, H. Li, Y. Yu, C. Zhou, and D. Cao, “Combining deep features
10.1061/(ASCE)CO.1943-7862.0001647. and activity context to improve recognition of activities of workers in
[32] H. Wu, B. Zhong, H. Li, and N. Zhao, “Combining computer vision groups,” Comput.-Aided Civ. Infrastruct. Eng., vol. 35, Feb. 2020, doi:
with semantic reasoning for on-site safety management in construction,” 10.1111/mice.12538.
J. Build. Eng., vol. 42, p. 103036, Oct. 2021, doi: [51] Q. Fang, H. Li, X. Luo, C. Li, and W. An, “A sematic and prior-
10.1016/j.jobe.2021.103036. knowledge-aided monocular localization method for construction-
[33] K. Ting, W. Fang, H. Luo, S. Xu, and H. Li, “Computer vision and long related entities,” Comput.-Aided Civ. Infrastruct. Eng., vol. 35, Mar.
short-term memory: Learning to predict unsafe behaviour in 2020, doi: 10.1111/mice.12541.
construction,” Adv. Eng. Inform., vol. 50, p. 101400, Oct. 2021, doi: [52] H. Luo, C. Xiong, W. Fang, P. E. D. Love, B. Zhang, and X. Ouyang,
10.1016/j.aei.2021.101400. “Convolutional neural networks: Computer vision-based workforce
[34] Y. Yu, H. Li, X. Yang, L. Kong, X. Luo, and A. Wong, “An automatic activity assessment in construction,” Autom. Constr., vol. 94, pp. 282–
and non-invasive physical fatigue assessment method for construction 289, Oct. 2018, doi: 10.1016/j.autcon.2018.06.007.
workers,” Autom. Constr., vol. 103, pp. 1–12, Jul. 2019, doi: [53] W. Fang, B. Zhong, N. Zhao, H. Luo, J. Xue, and S. Xu, “A Deep
10.1016/j.autcon.2019.02.020. Learning-based Approach for Mitigating Falls from Height with
[35] X. Yan, H. Zhang, and H. Li, “Computer vision‐based recognition Computer Vision: Convolutional Neural Network Accepted Version,”
of 3D relationship between construction entities for monitoring Adv. Eng. Inform., vol. 39, pp. 170–177, Jan. 2019, doi:
struck‐by accidents,” Comput.-Aided Civ. Infrastruct. Eng., pp. 1– 10.1016/j.aei.2018.12.005.
16, Sep. 2020, doi: 10.1111/mice.12536. [54] R. Wei, W. Fang, H. Luo, and S. Xu, “Recognizing People’s Identity
[36] Y. Yu et al., “Automatic Biomechanical Workload Estimation for in Construction Sites with Computer Vision: A Spatial and Temporal
Construction Workers by Computer Vision and Smart Insoles,” J. Attention Pooling Network,” Adv. Eng. Inform., vol. 42, Aug. 2019,
Comput. Civ. Eng., vol. 33, May 2019, doi: 10.1061/(ASCE)CP.1943- doi: 10.1016/j.aei.2019.100981.
5487.0000827. [55] F. Weili, L. Ding, and H. Luo, “Falls from Heights: A Computer
[37] J. Seo, K. Yin, and S. Lee, Automated Postural Ergonomic Assessment Vision-based Approach for Safety Harness Detection,” Autom. Constr.,
Using a Computer Vision-Based Posture Classification. 2016, p. 818. vol. 91, pp. 53–61, Feb. 2018, doi: 10.1016/j.autcon.2018.02.018.
doi: 10.1061/9780784479827.082. [56] W. Fang, L. Ding, B. Zhong, P. E. D. Love, and H. Luo, “Automated
[38] Q. Fang et al., “A deep learning-based method for detecting non- detection of workers and heavy equipment on construction sites: A
certified work on construction sites,” Adv. Eng. Inform., vol. 35, pp. convolutional neural network approach,” Adv. Eng. Inform., vol. 37, pp.
56–68, Jan. 2018, doi: 10.1016/j.aei.2018.01.001. 139–149, Aug. 2018, doi: 10.1016/j.aei.2018.05.003.
[39] X. Yan, H. Zhang, and H. Li, “Estimating Worker-Centric 3D Spatial [57] J. Liu, W. Fang, T. Hartmann, H. Luo, and L. Wang, “Detection and
Crowdedness for Construction Safety Management Using a Single 2D location of unsafe behaviour in digital images: A visual grounding
Camera,” J. Comput. Civ. Eng., vol. 33, p. 04019030, Jun. 2019, doi: approach,” Adv. Eng. Inform., vol. 53, p. 101688, Aug. 2022, doi:
10.1061/(ASCE)CP.1943-5487.0000844. 10.1016/j.aei.2022.101688.
[40] H. Liu, G. Wang, T. Huang, P. He, M. Skitmore, and X. Luo, [58] L. Ding, W. Fang, H. Luo, B. Zhong, and X. Quyang, “A Deep Hybrid
“Manifesting construction activity scenes via image captioning,” Learning Model to Detect Unsafe Behavior: Integrating Convolution
Autom. Constr., vol. 119, p. 103334, Nov. 2020, doi: Neural Networks and Long Short-Term Memory,” Autom. Constr., vol.
10.1016/j.autcon.2020.103334. 86, p. 124, Feb. 2018, doi: 10.1016/j.autcon.2017.11.002.
[41] X. Luo, H. Li, H. Wang, Z. Wu, F. Dai, and D. Cao, “Vision-based [59] S. Anjum, N. Khan, R. Khalid, M. Khan, L. Dongmin, and C. Park,
detection and visualization of dynamic workspaces,” Autom. Constr., “Fall Prevention From Ladders Utilizing a Deep Learning-Based
vol. 104, pp. 1–13, Aug. 2019, doi: 10.1016/j.autcon.2019.04.001. Height Assessment Method,” IEEE Access, vol. 10, pp. 1–1, Jan. 2022,
[42] Q. Fang et al., “Detecting non-hardhat-use by a deep learning method doi: 10.1109/ACCESS.2022.3164676.
from far-field surveillance videos,” Autom. Constr., vol. 85, pp. 1–9, [60] M. Khan, R. Khalid, S. Anjum, S. Tran, and C. Park, “Fall Prevention
Jan. 2018, doi: 10.1016/j.autcon.2017.09.018. from Scaffolding Using Computer Vision and IoT-Based Monitoring,”
[43] X. Luo, H. Li, D. Cao, F. Dai, J. Seo, and S. Lee, “Recognizing Diverse J. Constr. Eng. Manag., vol. 148, pp. 1–15, Apr. 2022, doi:
Construction Activities in Site Images via Relevance Networks of 10.1061/%28ASCE%29CO.1943-7862.0002278.
Construction Related Objects Detected by Convolutional Neural [61] J. Lim, D. Jung, C. Park, and D. Kim, “Computer Vision Process
Networks,” J. Comput. Civ. Eng., vol. 32, Nov. 2017, doi: Development regarding Worker’s Safety Harness and Hook to Prevent
10.1061/(ASCE)CP.1943-5487.0000756. Fall Accidents: Focused on System Scaffolds in South Korea,” Adv.
[44] Y. Tian, H. Li, H. Cui, and J. Chen, “Construction motion data library: Civ. Eng., vol. 2022, Jul. 2022, doi: 10.1155/2022/4678479.
an integrated motion dataset for on-site activity recognition,” Sci. Data, [62] H. Son, H. Choi, H. Seong, and C. Kim, “Detection of construction
vol. 9, Nov. 2022, doi: 10.1038/s41597-022-01841-1. workers under varying poses and changing background in image
[45] X. Luo, H. Li, D. Cao, Y. Yu, X. Yang, and T. Huang, “Towards sequences via very deep residual networks,” Autom. Constr., vol. 99,
efficient and objective work sampling: Recognizing workers’ activities pp. 27–38, Mar. 2019, doi: 10.1016/j.autcon.2018.11.033.
in site surveillance videos with two-stream convolutional networks,” [63] H. Son and C. Kim, “Integrated worker detection and tracking for the
Autom. Constr., vol. 94, pp. 360–370, Oct. 2018, doi: safe operation of construction machinery,” Autom. Constr., vol. 126, p.
10.1016/j.autcon.2018.07.011. 103670, Jun. 2021, doi: 10.1016/j.autcon.2021.103670.
[46] H. Zhang, X. Yan, and H. Li, “Ergonomic posture recognition using [64] H. Son, H. Seong, H. Choi, and C. Kim, “Real-Time Vision-Based
3D view-invariant features from single ordinary camera,” Autom. Warning System for Prevention of Collisions between Workers and
Constr., vol. 94, pp. 1–10, Oct. 2018, doi: Heavy Equipment,” J. Comput. Civ. Eng., vol. 33, p. 04019029, Sep.
10.1016/j.autcon.2018.05.033. 2019, doi: 10.1061/(ASCE)CP.1943-5487.0000845.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

[65] N. Khan, M. R. Saleem, D. Lee, M.-W. Park, and C. Park, “Utilizing and dynamic construction sites,” Adv. Eng. Inform., vol. 46, p. 101173,
safety rule correlation for mobile scaffolds monitoring leveraging deep Oct. 2020, doi: 10.1016/j.aei.2020.101173.
convolution neural networks,” Comput. Ind., vol. 129, p. 103448, Aug. [84] J. Cai, L. Yang, Y. Zhang, S. Li, and H. Cai, “Multitask Learning
2021, doi: 10.1016/j.compind.2021.103448. Method for Detecting the Visual Focus of Attention of Construction
[66] D. Roberts, W. Torres-Calderon, S. Tang, and M. Golparvar-Fard, Workers,” J. Constr. Eng. Manag., vol. 147, Apr. 2021, doi:
“Vision-Based Construction Worker Activity Analysis Informed by 10.1061/(ASCE)CO.1943-7862.0002071.
Body Posture,” J. Comput. Civ. Eng., vol. 34, p. 04020017, Jul. 2020, [85] J. Canny, “A Computational Approach to Edge Detection,” in Readings
doi: 10.1061/(ASCE)CP.1943-5487.0000898. in Computer Vision, M. A. Fischler and O. Firschein, Eds., San
[67] S. Tang, D. Roberts, and M. Golparvar-Fard, “Human-object Francisco (CA): Morgan Kaufmann, 1987, pp. 184–203. doi:
interaction recognition for automatic construction site safety inspection,” 10.1016/B978-0-08-051581-6.50024-6.
Autom. Constr., vol. 120, p. 103356, Dec. 2020, doi: [86] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn.,
10.1016/j.autcon.2020.103356. vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
[68] S. Han and S. Lee, “A vision-based motion capture and recognition [87] R. Kalman, “A New Approach To Linear Filtering and Prediction
framework for behavior-based safety management,” Autom. Constr., Problems,” J. Basic Eng. ASME, vol. 82D, pp. 35–45, Mar. 1960, doi:
vol. 35, pp. 131–141, Nov. 2013, doi: 10.1016/j.autcon.2013.05.001. 10.1115/1.3662552.
[69] A. Khosrowpour, J. C. Niebles, and M. Golparvar-Fard, “Vision-based [88] P. Felzenszwalb and D. Huttenlocher, “Efficient Graph-Based Image
workface assessment using depth images for activity analysis of interior Segmentation,” Int. J. Comput. Vis., vol. 59, pp. 167–181, Sep. 2004,
construction operations,” Autom. Constr., vol. 48, pp. 74–87, Dec. 2014, doi: 10.1023/B%3AVISI.0000022288.19776.77.
doi: 10.1016/j.autcon.2014.08.003. [89] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for
[70] S. Han, S. Lee, and F. Peña-Mora, “Vision-Based Detection of Unsafe Image Recognition.” arXiv, Dec. 10, 2015. doi:
Actions of a Construction Worker: Case Study of Ladder Climbing,” J. 10.48550/arXiv.1512.03385.
Comput. Civ. Eng., vol. 27, pp. 635–644, Nov. 2013, doi: [90] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent Neural Network
10.1061/(ASCE)CP.1943-5487.0000279. Regularization.” arXiv, Feb. 19, 2015. doi: 10.48550/arXiv.1409.2329.
[71] S. Tang and M. Golparvar-Fard, “Machine Learning-Based Risk [91] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-
Analysis for Construction Worker Safety from Ubiquitous Site Photos Time Object Detection with Region Proposal Networks.” arXiv, Jan. 06,
and Videos,” J. Comput. Civ. Eng., vol. 35, Aug. 2021, doi: 2016. doi: 10.48550/arXiv.1506.01497.
10.1061/(ASCE)CP.1943-5487.0000979. [92] “GitHub - ultralytics/yolov5: YOLOv5 in PyTorch > ONNX >
[72] M. Memarzadeh, M. Golparvar-Fard, and J. C. Niebles, “Automated CoreML > TFLite.” Accessed: Apr. 21, 2023. [Online]. Available:
2D detection of construction equipment and workers from site video https://github.com/ultralytics/yolov5
streams using histograms of oriented gradients and colors,” Autom. [93] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in
Constr., vol. 32, Jul. 2013, doi: 10.1016/j.autcon.2012.12.002. 2017 IEEE International Conference on Computer Vision (ICCV), Oct.
[73] S. Tang, M. Golparvar-Fard, M. Naphade, and M. Gopalakrishna, 2017, pp. 2980–2988. doi: 10.1109/ICCV.2017.322.
“Video-Based Motion Trajectory Forecasting Method for Proactive [94] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image
Construction Safety Monitoring Systems,” J. Comput. Civ. Eng., vol. Translation with Conditional Adversarial Networks.” arXiv, Nov. 26,
34, p. 04020041, Nov. 2020, doi: 10.1061/(ASCE)CP.1943- 2018. doi: 10.48550/arXiv.1611.07004.
5487.0000923. [95] “Long Short-Term Memory | Neural Computation | MIT Press.”
[74] M. Zhang, Z. Cao, Z. Yang, and X. Zhao, “Utilizing Computer Vision Accessed: Apr. 24, 2023. [Online]. Available:
and Fuzzy Inference to Evaluate Level of Collision Safety for Workers https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-
and Equipment in a Dynamic Environment,” J. Constr. Eng. Manag., Term-Memory?redirectedFrom=fulltext
vol. 146, p. 04020051, Jun. 2020, doi: 10.1061/(ASCE)CO.1943- [96] M. Yang et al., “Transformer-based deep learning model and video
7862.0001802. dataset for unsafe action identification in construction projects,” Autom.
[75] J. Li, X. Zhao, G. Zhou, M. Zhang, D. Li, and Y. Zhou, “Evaluating Constr., vol. 146, p. 104703, Feb. 2023, doi:
the Work Productivity of Assembling Reinforcement through the 10.1016/j.autcon.2022.104703.
Objects Detected by Deep Learning,” Sensors, vol. 21, p. 5598, Aug. [97] W. Liu, Y. Shao, S. Zhai, Z. Yang, and P. Chen, “Computer Vision-
2021, doi: 10.3390/s21165598. Based Tracking of Workers in Construction Sites Based on MDNet,”
[76] J. Li, G. Zhou, D. Li, M. Zhang, and X. Zhao, “Recognizing workers’ IEICE Trans. Inf. Syst., vol. E106.D, no. 5, pp. 653–661, 2023, doi:
construction activities on a reinforcement processing area through the 10.1587/transinf.2022DLP0045.
position relationship of objects detected by faster R-CNN,” Eng. Constr. [98] L. Huang, Q. Fu, M. He, D. Jiang, and Z. Hao, “Detection algorithm of
Archit. Manag., vol. ahead-of-print, Jan. 2022, doi: 10.1108/ECAM- safety helmet wearing based on deep learning,” Concurr. Comput. Pract.
04-2021-0312. Exp., vol. 33, no. 13, p. e6234, 2021, doi: 10.1002/cpe.6234.
[77] M. Zhang and S. Ge, “Vision and Trajectory–Based Dynamic Collision [99] M. Park, D. Q. Tran, J. Bak, and S. Park, “Small and overlapping
Prewarning Mechanism for Tower Cranes,” J. Constr. Eng. Manag., vol. worker detection at construction sites,” Autom. Constr., vol. 151, p.
148, Jul. 2022, doi: 10.1061/(ASCE)CO.1943-7862.0002309. 104856, Jul. 2023, doi: 10.1016/j.autcon.2023.104856.
[78] J. Li, X. Zhao, G. Zhou, and M. Zhang, “Standardized use inspection [100] K. Han and X. Zeng, “Deep Learning-Based Workers Safety Helmet
of workers’ personal protective equipment based on deep learning,” Saf. Wearing Detection on Construction Sites Using Multi-Scale Features,”
Sci., vol. 150, p. 105689, Jun. 2022, doi: 10.1016/j.ssci.2022.105689. IEEE Access, vol. PP, pp. 1–1, Dec. 2021, doi:
[79] Z. Li and D. Li, “Action recognition of construction workers under 10.1109/ACCESS.2021.3138407.
occlusion,” J. Build. Eng., vol. 45, p. 103352, Oct. 2021, doi: [101] N.-T. Nguyen, D.-Q. Bui, C. Tran, and H. Tran, “Improved detection
10.1016/j.jobe.2021.103352. network model based on YOLOv5 for warning safety in construction
[80] Z. Yang, Y. Yuan, M. Zhang, X. Zhao, Y. Zhang, and B. Tian, “Safety sites,” Int. J. Constr. Manag., pp. 1–11, Feb. 2023, doi:
Distance Identification for Crane Drivers Based on Mask R-CNN,” 10.1080/15623599.2023.2171836.
Sensors, vol. 19, p. 2789, Jun. 2019, doi: 10.3390/s19122789. [102] H. Peng and Z. Zhang, “Helmet Wearing Recognition of Construction
[81] J. Cai and H. Cai, “Robust Hybrid Approach of Vision-Based Tracking Workers Using Convolutional Neural Network,” Wirel. Commun. Mob.
and Radio-Based Identification and Localization for 3D Tracking of Comput., vol. 2022, pp. 1–8, Apr. 2022, doi: 10.1155/2022/4739897.
Multiple Construction Workers,” J. Comput. Civ. Eng., vol. 34, p. [103] J. Jain, R. Parekh, J. Parekh, S. Shah, and P. Kanani, “Helmet
04020021, May 2020, doi: 10.1061/(ASCE)CP.1943-5487.0000901. Detection and License Plate Extraction Using Machine Learning and
[82] J. Cai, X. Li, X. Liang, W. Wei, and S. Li, Construction Worker Computer Vision,” 2023, pp. 258–268. doi: 10.1007/978-3-031-22405-
Ergonomic Assessment via LSTM-Based Multi-Task Learning 8_20.
Framework. 2022, p. 224. doi: 10.1061/9780784483961.023. [104] L. Wang et al., “Investigation into Recognition Algorithm of Helmet
[83] J. Cai, Y. Zhang, L. Yang, H. Cai, and S. Li, “A context-augmented Violation Based on YOLOv5-CBAM-DCN,” IEEE Access, vol. 10, pp.
deep learning approach for worker trajectory prediction on unstructured 1–1, Jan. 2022, doi: 10.1109/ACCESS.2022.3180796.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

[105] S. Yue, Q. Zhang, D. Shao, Y. Fan, and J. Bai, “Safety helmet wearing [124] Y.-J. Lee and M.-W. Park, “3D tracking of multiple onsite workers
status detection based on improved boosted random ferns,” Multimed. based on stereo vision,” Autom. Constr., vol. 98, pp. 146–159, Feb.
Tools Appl., vol. 81, May 2022, doi: 10.1007/s11042-022-12014-y. 2019, doi: 10.1016/j.autcon.2018.11.017.
[106] H.-P. Wan, W.-J. Zhang, H.-B. Ge, Y. Luo, and M. D. Todd, [125] M. Liu, D. Hong, S. Han, and S. Lee, Silhouette-Based On-Site
“Improved Vision-Based Method for Detection of Unauthorized Human Action Recognition in Single-View Video. 2016, p. 959. doi:
Intrusion by Construction Sites Workers,” J. Constr. Eng. Manag., vol. 10.1061/9780784479827.096.
149, no. 7, p. 04023040, Jul. 2023, doi: 10.1061/JCEMD4.COENG- [126] J. Yang, “Enhancing action recognition of construction workers using
13294. data-driven scene parsing,” J. Civ. Eng. Manag., vol. 24, pp. 568–580,
[107] W. Jixiu, N. Cai, W. Chen, H. Wang, and G. Wang, “Automatic Nov. 2018, doi: 10.3846/jcem.2018.6133.
detection of hardhats worn by construction personnel: A deep learning [127] S. Bhokare, L. Goyal, R. Ren, and J. Zhang, “Smart construction
approach and benchmark dataset,” Autom. Constr., vol. 106, p. 102894, scheduling monitoring using YOLOv3-based activity detection and
Oct. 2019, doi: 10.1016/j.autcon.2019.102894. classification,” J. Inf. Technol. Constr., vol. 27, pp. 240–252, Mar. 2022,
[108] J. Yang, Z. Shi, and Z. Wu, “Vision-based action recognition of doi: 10.36680/j.itcon.2022.012.
construction workers using dense trajectories,” Adv. Eng. Inform., vol. [128] J. Cai, Y. Zhang, and H. Cai, “Two-step long short-term memory
30, pp. 327–336, Aug. 2016, doi: 10.1016/j.aei.2016.04.009. method for identifying construction activities through positional and
[109] B. Xiao, H. Xiao, J. Wang, and Y. Chen, “Vision-based method for attentional cues,” Autom. Constr., vol. 106, Jul. 2019, doi:
tracking workers by integrating deep learning instance segmentation in 10.1016/j.autcon.2019.102886.
off-site construction,” Autom. Constr., vol. 136, p. 104148, Apr. 2022, [129] G. Torabi, A. Hammad, and N. Bouguila, “Two-Dimensional and
doi: 10.1016/j.autcon.2022.104148. Three-Dimensional CNN-Based Simultaneous Detection and Activity
[110] Y. Piao, W. Xu, and T.-K. Wang, “Dynamic Fall Risk Assessment Classification of Construction Workers,” J. Comput. Civ. Eng., vol. 36,
Framework for Construction Workers Based on Dynamic Bayesian Jul. 2022, doi: 10.1061/(ASCE)CP.1943-5487.0001024.
Network and Computer Vision,” J. Constr. Eng. Manag., Oct. 2021, doi: [130] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime Multi-Person
10.1061/(ASCE)CO.1943-7862.0002200. 2D Pose Estimation using Part Affinity Fields.” arXiv, Apr. 13, 2017.
[111] JackC. P. Cheng, P. K.-Y. Wong, H. Luo, M. Wang, and P. Leung, doi: 10.48550/arXiv.1611.08050.
“Vision-based monitoring of site safety compliance based on worker [131] H.-S. Fang et al., “AlphaPose: Whole-Body Regional Multi-Person
re-identification and personal protective equipment classification,” Pose Estimation and Tracking in Real-Time.” arXiv, Nov. 07, 2022. doi:
Autom. Constr., vol. 139, p. 104312, Jul. 2022, doi: 10.48550/arXiv.2211.03375.
10.1016/j.autcon.2022.104312. [132] M. Liu, S. Han, and S. Lee, “Tracking-based 3D human skeleton
[112] B. Yang, Z. Wang, and B. Liu, “Reidentification-Based Automated extraction from stereo video camera toward an on-site safety and
Matching for 3D Localization of Workers in Construction Sites,” J. ergonomic analysis,” Constr. Innov., vol. 16, pp. 348–367, Jul. 2016,
Comput. Civ. Eng., vol. 35, Nov. 2021, doi: 10.1061/(asce)cp.1943- doi: 10.1108/CI-10-2015-0054.
5487.0000975. [133] J. Seo, R. Starbuck, S. Han, S. Lee, and T. Armstrong, “Motion Data-
[113] H. Deng, Z. Ou, and Y. Deng, “Multi-Angle Fusion-Based Safety Driven Biomechanical Analysis during Construction Tasks on Sites,” J.
Status Analysis of Construction Workers,” Int. J. Environ. Res. Public. Comput. Civ. Eng., vol. 29, p. B4014005, Oct. 2014, doi:
Health, vol. 18, p. 11815, Nov. 2021, doi: 10.3390/ijerph182211815. 10.1061/(ASCE)CP.1943-5487.0000400.
[114] F. Li, Y. Chen, M. Hu, M. Luo, and G. Wang, “Helmet-Wearing [134] J. Seo, A. Alwasel, S. Lee, E. Abdel-Rahman, and C. Haas, “A
Tracking Detection Based on StrongSORT,” Sensors, vol. 23, p. 1682, comparative study of in-field motion capture approaches for body
Feb. 2023, doi: 10.3390/s23031682. kinematics measurement in construction,” Robotica, vol. 37, pp. 1–19,
[115] Y.-R. Lee, S.-H. Jung, K.-S. Kang, H.-C. Ryu, and H.-G. Ryu, “Deep Dec. 2017, doi: 10.1017/S0263574717000571.
learning-based framework for monitoring wearing personal protective [135] W. Chu, S. Han, X. Luo, and Z. Zhu, “Monocular Vision-Based
equipment on construction sites,” J. Comput. Des. Eng., vol. 10, no. 2, Framework for Biomechanical Analysis or Ergonomic Posture
pp. 905–917, Apr. 2023, doi: 10.1093/jcde/qwad019. Assessment in Modular Construction,” J. Comput. Civ. Eng., vol. 34, p.
[116] P. Li, F. Wu, S. Xue, and L. Guo, “Study on the Interaction Behaviors 04020018, Jul. 2020, doi: 10.1061/(ASCE)CP.1943-5487.0000897.
Identification of Construction Workers Based on ST-GCN and YOLO,” [136] J. Kim, K. Daeho, J. Shah, and S. Lee, Synthetic Training Image
SENSORS, vol. 23, no. 14, p. 6318, Jul. 2023, doi: 10.3390/s23146318. Dataset for Vision-Based 3D Pose Estimation of Construction Workers.
[117] I. Jeelani, K. Asadi, H. Ramshankar, K. Han, and A. Albert, “Real- 2022, p. 262. doi: 10.1061/9780784483961.027.
time vision-based worker localization & hazard detection for [137] J. Seo and S. Lee, “Automated postural ergonomic risk assessment
construction,” Autom. Constr., vol. 121, p. 103448, Jan. 2021, doi: using vision-based posture classification,” Autom. Constr., vol. 128, p.
10.1016/j.autcon.2020.103448. 103725, Aug. 2021, doi: 10.1016/j.autcon.2021.103725.
[118] H. Huang, H. Hu, F. Xu, Z. Zhang, and Y. Tao, “Skeleton-based [138] E. Chian, Y. M. Goh, J. Tian, and B. Guo, “Dynamic identification of
automatic assessment and prediction of intrusion risk in construction crane load fall zone: A computer vision approach,” Saf. Sci., vol. 156,
hazardous areas,” Saf. Sci., vol. 164, p. 106150, Aug. 2023, doi: p. 105904, Dec. 2022, doi: 10.1016/j.ssci.2022.105904.
10.1016/j.ssci.2023.106150. [139] E. Chian, Y. M. Goh, and J. Tian, “Management of Safe Distancing
[119] X. Mei, X. Zhou, F. Xu, and Z. Zhang, “Human Intrusion Detection on Construction Sites During COVID-19: A Smart Real-time
in Static Hazardous Areas at Construction Sites: Deep Learning–Based Monitoring System,” Comput. Ind. Eng., vol. 163, p. 107847, Dec.
Method,” J. Constr. Eng. Manag., vol. 149, p. 04022142, Jan. 2023, doi: 2021, doi: 10.1016/j.cie.2021.107847.
10.1061/(ASCE)CO.1943-7862.0002409. [140] Z. Chen, L. Wu, H. He, Z. Jiao, and L. Wu, “Vision-based Skeleton
[120] M. Neuhausen, D. Pawlowski, and M. König, “Comparing Classical Motion Phase to Evaluate Working Behavior: Case Study of Ladder
and Modern Machine Learning Techniques for Monitoring Pedestrian Climbing Safety,” Hum.-Centric Comput. Inf. Sci., vol. 12, Jan. 2022,
Workers in Top-View Construction Site Video Sequences,” Appl. Sci., doi: 10.22967/HCIS.2022.12.001.
vol. 10, p. 8466, Nov. 2020, doi: 10.3390/app10238466. [141] S. Wu, L. Hou, G. Zhang, and H. Chen, “Real-time mixed reality-
[121] M. Neuhausen, P. Herbers, and M. König, “Using Synthetic Data to based visual warning for construction workforce safety,” Autom.
Improve and Evaluate the Tracking Performance of Construction Constr., vol. 139, p. 104252, Jul. 2022, doi:
Workers on Site,” Appl. Sci., vol. 10, p. 4948, Jul. 2020, doi: 10.1016/j.autcon.2022.104252.
10.3390/app10144948. [142] T. Dang, T. Le, T. Hong, and V. Nguyen, Fast and Accurate Fall
[122] L. Yongyue, Z. Zhou, and Y. Wang, A Tracking Method of Multi- Detection and Warning System Using Image Processing Technology.
Workers Onsite with Kalman Filter and OpenPose. 2021, p. 280. doi: 2021, p. 210. doi: 10.1109/ATC52653.2021.9598204.
10.1061/9780784483848.031. [143] P. Hung and N. Su, “Unsafe Construction Behavior Classification
[123] O. Angah and A. Chen, “Tracking multiple construction workers Using Deep Convolutional Neural Network,” Pattern Recognit. Image
through deep learning and the gradient based method with re-matching Anal., vol. 31, pp. 271–284, Apr. 2021, doi:
based on multi-object tracking accuracy,” Autom. Constr., vol. 119, p. 10.1134/S1054661821020073.
103308, Nov. 2020, doi: 10.1016/j.autcon.2020.103308.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

[144] P. Zhai, J. Wang, and L. Zhang, “Extracting Worker Unsafe and Robotics (ecar 2018), Lancaster: Destech Publications, Inc, 2018,
Behaviors from Construction Images Using Image Captioning with pp. 470–476. Accessed: May 07, 2023. [Online]. Available:
Deep Learning–Based Attention Mechanism,” J. Constr. Eng. Manag., https://www.webofscience.com/wos/woscc/full-
vol. 149, Feb. 2023, doi: 10.1061/JCEMD4.COENG-12096. record/WOS:000453087200083
[145] H. Kim, K. Kim, and H. Kim, “Vision-Based Object-Centric Safety [164] A. Kamoona et al., “Random Finite Set-Based Anomaly Detection for
Assessment Using Fuzzy Inference: Monitoring Struck-By Accidents Safety Monitoring in Construction Sites,” IEEE Access, vol. PP, pp. 1–
with Moving Objects,” J. Comput. Civ. Eng., vol. 30, p. 04015075, Dec. 1, Jul. 2019, doi: 10.1109/ACCESS.2019.2932137.
2015, doi: 10.1061/(ASCE)CP.1943-5487.0000562. [165] V. Delhi, S. Lal, and A. Thomas, “Detection of Personal Protective
[146] Y.-S. Shin and J. Kim, “A Vision-Based Collision Monitoring System Equipment (PPE) Compliance on Construction Site Using Computer
for Proximity of Construction Workers to Trucks Enhanced by Posture- Vision Based Deep Learning Techniques,” Front. Built Environ., vol.
Dependent Perception and Truck Bodies’ Occupied Space,” 6, Sep. 2020, doi: 10.3389/fbuil.2020.00136.
Sustainability, vol. 14, p. 7934, Jun. 2022, doi: 10.3390/su14137934. [166] N. Nath, A. Behzadan, and S. Paal, “Deep learning for site safety:
[147] M. Neuhausen, P. Herbers, and M. König, Synthetic Data for Real-time detection of personal protective equipment,” Autom. Constr.,
Evaluating the Visual Tracking of Construction Workers. 2020. doi: vol. 112, p. 103085, Apr. 2020, doi: 10.1016/j.autcon.2020.103085.
10.1061/9780784482865.038. [167] H. Wu and Z. Liao, Design and implementation of safety helmet
[148] Q. Hu et al., “Intelligent Framework for Worker-Machine Safety detection system based on computer vision. 2021. doi:
Assessment,” J. Constr. Eng. Manag., vol. 146, p. 04020045, May 2020, 10.1117/12.2626829.
doi: 10.1061/(ASCE)CO.1943-7862.0001801. [168] A. Hayat and F. Morgado-Dias, “Deep Learning-Based Automatic
[149] B. Yang, B. Zhang, Q. Zhang, Z. Wang, M. Dong, and T. Fang, Safety Helmet Detection System for Construction Safety,” Appl. Sci.,
“Automatic detection of falling hazard from surveillance videos based vol. 12, p. 8268, Aug. 2022, doi: 10.3390/app12168268.
on computer vision and building information modeling,” Struct. [169] Z. Shanti, C.-S. Cho, B. Garcí a de Soto, Y.-J. Byon, C. Yeun, and T.
Infrastruct. Eng., vol. 18, pp. 1–15, Feb. 2022, doi: Kim, “Real-time monitoring of work-at-height safety hazards in
10.1080/15732479.2022.2039217. construction sites using drones and deep learning,” J. Safety Res., vol.
[150] Q. Hu, Y. Bai, L. He, J. Huang, H. Wang, and G. Cheng, “Workers’ 83, Oct. 2022, doi: 10.1016/j.jsr.2022.09.011.
Unsafe Actions When Working at Heights: Detecting from Images,” [170] M. Alateeq, F. P.P., and M. Ali, “Construction Site Hazards
Sustainability, vol. 14, p. 6126, May 2022, doi: 10.3390/su14106126. Identification Using Deep Learning and Computer Vision,”
[151] B. Lee, S. Hong, and H. Kim, “Determination of workers? compliance Sustainability, vol. 15, p. 2358, Jan. 2023, doi: 10.3390/su15032358.
to safety regulations using a spatio-temporal graph convolution [171] M. Ferdous and S. M. M. Ahsan, “PPE detector: a YOLO-based
network,” Adv. Eng. Inform., vol. 56, p. 101942, Apr. 2023, doi: architecture to detect personal protective equipment (PPE) for
10.1016/j.aei.2023.101942. construction sites,” PeerJ Comput. Sci., vol. 8, p. 24, Jun. 2022, doi:
[152] Ministry of Housing and Urban-Rural Development of the People's 10.7717/peerj-cs.999.
Republic of China. Accessed: May 09, 2023. [Online]. Available: [172] L. Zeng, X. Duan, Y. Pan, and M. Deng, “Research on the algorithm
https://www.mohurd.gov.cn/ of helmet-wearing detection based on the optimized yolov4,” Vis.
[153] “Industries at a Glance: Construction: NAICS 23 : U.S. Bureau of Comput., vol. 39, pp. 1–11, May 2022, doi: 10.1007/s00371-022-
Labor Statistics.” Accessed: May 09, 2023. [Online]. Available: 02471-9.
https://www.bls.gov/iag/tgs/iag23.htm [173] M. Nain, S. Sharma, and C. Sandeep, “Authentication control system
[154] S. Konda, H. M. Tiesman, and A. A. Reichard, “Fatal traumatic brain for the efficient detection of hard-hats using deep learning algorithms,”
injuries in the construction industry, 2003−2010,” Am. J. Ind. Med., vol. J. Discrete Math. Sci. Cryptogr., vol. 24, pp. 2291–2306, Nov. 2021,
59, no. 3, pp. 212–220, 2016, doi: 10.1002/ajim.22557. doi: 10.1080/09720529.2021.2011109.
[155] Code of practice for selection of personal protective equipment, GB/T [174] Z. Shanti et al., “A Novel Implementation of an AI-Based Smart
11651-2008. Accessed: May 09, 2023. [Online]. Available: Construction Safety Inspection Protocol in the UAE,” IEEE Access, vol.
https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=0307C61B4C PP, pp. 1–1, Dec. 2021, doi: 10.1109/ACCESS.2021.3135662.
CE89BE4316BCE2A36C57DC [175] Z. Xu, J. Huang, and K. Huang, “A novel computer vision‐based
[156] “1926.100 - Head protection. | Occupational Safety and Health approach for monitoring safety harness use in construction, ” IET
Administration.” Accessed: May 09, 2023. [Online]. Available: Image Process., vol. 17, p. n/a-n/a, Nov. 2022, doi: 10.1049/ipr2.12696.
https://www.osha.gov/laws- [176] H. Liang and S. Seo, “Automatic Detection of Construction Workers’
regs/regulations/standardnumber/1926/1926.100 Helmet Wear Based on Lightweight Deep Learning,” Appl. Sci., vol.
[157] M.-W. Park, N. Elsafty, and Z. Zhu, “Hardhat-Wearing Detection for 12, p. 10369, Oct. 2022, doi: 10.3390/app122010369.
Enhancing On-Site Safety of Construction Workers,” J. Constr. Eng. [177] J. Lee and S. Lee, “Construction Site Safety Management: A
Manag., vol. 141, p. 04015024, Jan. 2015, doi: Computer Vision and Deep Learning Approach,” Sensors, vol. 23, p.
10.1061/(ASCE)CO.1943-7862.0000974. 944, Jan. 2023, doi: 10.3390/s23020944.
[158] B. E. Mneymneh, M. Abbas, and H. Khoury, “Evaluation of computer [178] L. Wu, N. Cai, Z. Liu, A. Yuan, and H. Wang, “A one-stage deep
vision techniques for automated hardhat detection in indoor learning framework for automatic detection of safety harnesses in high-
construction safety applications,” Front. Eng. Manag., vol. 5, no. 2, pp. altitude operations,” Signal Image Video Process., vol. 17, Apr. 2022,
227–239, Jun. 2018, doi: 10.15302/J-FEM-2018071. doi: 10.1007/s11760-022-02205-3.
[159] B. E. Mneymneh, M. Abbas, and H. Khoury, “Automated Hardhat [179] W.-C. Chern, J. Hyeon, T. Nguyen, V. K. Asari, and H. Kim,
Detection for Construction Safety Applications,” Procedia Eng., vol. “Context-aware safety assessment system for far-field monitoring,”
196, pp. 895–902, Jan. 2017, doi: 10.1016/j.proeng.2017.08.022. Autom. Constr., vol. 149, p. 104779, May 2023, doi:
[160] B. E. Mneymneh, M. Abbas, and H. Khoury, “Vision-Based 10.1016/j.autcon.2023.104779.
Framework for Intelligent Monitoring of Hardhat Wearing on [180] G. Yan, Q. Sun, J. Huang, and Y. Chen, “Helmet Detection Based on
Construction Sites,” J. Comput. Civ. Eng., vol. 33, Mar. 2019, doi: Deep Learning and Random Forest on UAV for Power Construction
10.1061/(ASCE)CP.1943-5487.0000813. Safety,” J. Adv. Comput. Intell. Intell. Inform., vol. 25, pp. 40–49, Jan.
[161] H. Seong, H. Son, and C. Kim, “A Comparative Study of Machine 2021, doi: 10.20965/jaciii.2021.p0040.
Learning Classification for Color-based Safety Vest Detection on [181] S. Chen and K. Demachi, “Towards on-site hazards identification of
Construction-Site Images,” KSCE J. Civ. Eng., vol. 22, Sep. 2018, doi: improper use of personal protective equipment using deep learning-
10.1007/s12205-017-1730-3. based geometric relationships and hierarchical scene graph,” Autom.
[162] Y. Gu, S. Xu, Y. Wang, and L. Shi, An Advanced Deep Learning Constr., vol. 125, p. 103619, May 2021, doi:
Approach for Safety Helmet Wearing Detection. 2019, p. 674. doi: 10.1016/j.autcon.2021.103619.
10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00128. [182] G. Iannizzotto, L. Lo Bello, and G. Patti, “Personal Protection
[163] J. Fu, Y. Chen, and S. Chen, “Design and Implementation of Vision Equipment detection system for embedded devices based on DNN and
Based Safety Detection Algorithm for Personnel in Construction Site,” Fuzzy Logic,” Expert Syst. Appl., vol. 184, p. 115447, Jun. 2021, doi:
in 2018 International Conference on Electrical, Control, Automation 10.1016/j.eswa.2021.115447.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3350773

Author Name: Preparation of Papers for IEEE Access (February 2017)

[183] Y. Gu, Y. Wang, L. Shi, N. Li, L. Zhuang, and S. Xu, “Automatic [186] “The Benefits of AI In Construction.” Accessed: Nov. 04, 2023.
detection of safety helmet wearing based on head region location,” IET [Online]. Available: https://constructible.trimble.com/construction-
Image Process., vol. 15, pp. 2441–2453, Sep. 2021, doi: industry/the-benefits-of-ai-in-construction
10.1049/ipr2.12231. [187] H. Luo, J. Liu, W. Fang, Q. Yu, and Z. Lu, “Real-time smart video
[184] S. Chen, K. Demachi, and F. Dong, “Graph-based linguistic and surveillance to manage safety: A case study of a transport mega-project,”
visual information integration for on-site occupational hazards Adv. Eng. Inform., vol. 45, p. 101100, Aug. 2020, doi:
identification,” Autom. Constr., vol. 137, p. 104191, May 2022, doi: 10.1016/j.aei.2020.101100.
10.1016/j.autcon.2022.104191. [188] M. Aljanabi, M. Yaseen, A. Ali, S. Abed, and Chatgpt, “ChatGpt:
[185] Y. Li, H. Wei, Z. Han, J. Huang, and W.-D. Wang, “Deep Learning- Open Possibilities,” vol. 4, Jan. 2023, doi:
Based Safety Helmet Detection in Engineering Management Based on 10.52866/ijcsm.2023.01.01.0018.
Convolutional Neural Networks,” Adv. Civ. Eng., vol. 2020, pp. 1–10, [189] “Your AI-Powered Copilot for the Web.” Accessed: May 19, 2023.
Sep. 2020, doi: 10.1155/2020/9703560. [Online]. Available: https://www.microsoft.com/en-
us/bing?form=MA13FJ

JIAQI LI was born in Anshan, Liaoning, China in LIXIAO ZHANG was born in Hebei, China in 1993.
1991. He received the B.S. degree in civil She received the B.S. degree in civil engineering
engineering from Liaoning Technical University, from Northwest A&F University in 2016, and in
and M.S. degree in structural engineering from 2022 she received Ph.D. degree in Structural
Shenyang Jianzhu University. In 2022, he received Engineering from Dalian University of Technology.
Ph.D. degree in structural engineering from Dalian Since 2022, she has been a lecturer in College of
University of Technology. Transportation Engineering, Dalian Maritime
Since 2022, he has been a lecturer in School of University. Her research interests include bridge
Civil Engineering, University of Science and structure health monitoring and structural damage
Technology Liaoning. His research interests visualization and diagnosis.
include artificial intelligence-based construction
monitoring, deep learning-based building structure damage detection, and ZHAOBO LI was born in Inner Mongolia, China in
earthquake prevention for building structures. He is currently in charge of 1989. He received the B.S. degree in engineering
three research projects and has published eight papers as well as one patent management from Hulunbuir University in 2014. In
to date. 2017, he received the M.S. degree in structural
engineering from Shenyang Jianzhu University.
Qi MIAO was born in Anyang, Henan, China in He is currently pursuing the Ph.D. degree in
2000. She received the B.S. degree in civil structural engineering at China University of Mining
engineering from Luoyang Institute of Science and and Technology. Meanwhile he is a senior engineer
Technology in 2022. of Hohhot Science and Technology Innovation
She is currently pursuing the M.S. degree in civil Service Center. His research interest is Intelligent inspection of building
engineering at University of Science and curtain wall. He has published more than 10 academic papers in related
Technology Liaoning. Her research interest is fields; obtained more than 70 national patents, including 15 invention
artificial intelligence-based construction safety patents, and more than 20 national computer software copyrights.
monitoring.

NAN WANG was born in Anshan, Liaoning, China


ZHENG ZOU was born in Dalian, Liaoning, China in 1990. She received the B.S. degree and M.S.
in 1994. She received the B.S. degree in civil degree from University of Science and Technology
engineering from Dalian University of Technology Liaoning. And in 2022, she received Ph.D. degree in
in 2016, and in 2022 she received Ph.D. degree in civil engineering from Nanjing University of
Structural Engineering from Dalian University of Aeronautics And Astronautics.
Technology. Since 2022, she has been a lecturer at School of
Since 2022, she has been a lecturer in College of Civil Engineering, University of Science and
Transportation Engineering, Dalian Maritime Technology Liaoning. Her research interests include
University. Her research interests include deep magnesium cementitious materials and concrete
learning-based structure health monitoring and durability.
computer vision-based ancient building inspection.

HUAGUO GAO was born in 1979. He received the


B.S. degree and M.S. degree in civil engineering
from Shenyang University of Technology. He is
currently pursuing Ph.D. degree in Institute of
Engineering Mechanics, China Earthquake
Administration.
Since 2006, he has been engaged in research and
teaching, he is currently a professor at School of
Civil Engineering, University of Science and
Technology Liaoning. His research interests include
seismic damage mechanism of engineering structures, new building
materials, and reinforcement of critical structures.
He is also a Member of the Board of Directors of the Liaoning Provincial
Civil Engineering and Architectural Society.

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like