0% found this document useful (0 votes)
86 views10 pages

YOLOv5 Ship Detection & Classification

Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5, a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the Common Objects in Context (COCO) dataset, was used to train on annotations of sub-classifications of maritime vessels.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views10 pages

YOLOv5 Ship Detection & Classification

Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5, a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the Common Objects in Context (COCO) dataset, was used to train on annotations of sub-classifications of maritime vessels.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BOHR International Journal of Smart Computing

and Information Technology


2023, Vol. 4, No. 1, pp. 7–16
DOI: 10.54646/bijscit.2023.32
[Link]

METHODS

Object detection and ship classification using YOLOv5


Sean Brown * , Caitlin Hall * , Raffaele Galliera * and Sikha Bagui *
Department of Computer Science, University of West Florida, Pensacola, FL, United States

*Correspondence:
Sean Brown,
sab138@[Link]
Caitlin Hall,
ceh59@[Link]
Raffaele Galliera,
rg101@[Link]
Sikha Bagui,
bagui@[Link]

Received: 08 February 2023; Accepted: 03 March 2023; Published: 18 March 2023

Using a public dataset of images of maritime vessels provided by Analytics Vidhya, manual annotations were made
on a subsample of images with Roboflow using the ground truth classifications provided by the dataset. YOLOv5,
a prominent open source family of object detection models that comes with an out-of-the-box pre-training on the
Common Objects in Context (COCO) dataset, was used to train on annotations of sub-classifications of maritime
vessels. YOLOv5 provides significant results in detecting a boat. The training, validation, and test set of images
trained YOLOv5 in the cloud using Google Colab. Three of our five subclasses, namely, cruise ships, ROROs (Roll
On Roll Off, typically car carriers), and military ships, have very distinct shapes and features and yielded positive
results. Two of our subclasses, namely, the tanker and cargo ship, have similar characteristics when the cargo
ship is unloaded and not carrying any cargo containers. This yielded interesting misclassifications that could be
improved in future work. Our trained model resulted in the validation metric of mean Average Precision ([email protected])
of 0.932 across all subclassification of ships.
Keywords: object detection, image classification, maritime, ship classification, YOLOv5.

Introduction Object detection can be applied to many domains as


long as a sufficient domain-based dataset is available. But,
In the military maritime realm, there is a desire to understand object detection algorithms must also consider condition
the environment passively without using emitters such as factors applicable to that domain, such as poor weather or
RADAR and LiDAR. A vessel can keep a low profile and lighting conditions.
detect navigational hazards or threats. In the commercial To study the application of object detection using still
maritime realm, ships can use images and videos in images and videos, an applicable dataset of maritime images
conjunction with RADAR to add another layer of confidence from Analytics Vidhya (1, 2) was used, and five different
to hazard avoidance. These industries need reliable object classifications of ships were defined for the algorithm to
detection using cameras. Hence, this study will explore the detect. This study also takes into consideration various
use of electro-optical cameras taking still images to be used in factors that can affect the results and attempts to improve the
a dataset to retrain a YOLOv5 model, a deep learning object training dataset and algorithmic configurables. A subsample
detector, on such images and evaluate the performances. of images has been annotated using Roboflow (3) and trained
Object detection is a technique for locating instances through an existing off-the-shelf object detection algorithm
of objects in either images or key frames from videos called YOLOv5 (4).
by leveraging machine learning algorithms with the goal YOLOv5 was born due to the improvements made
of replicating recognition intelligence using a computer. by Jocher (5), who ported YOLOv3’s Darknet weights

7
8 Brown et al.

(6) to PyTorch (7). PyTorch is an open-source machine in terms of using YOLO and object detection of ships,
learning framework that allows users to implement powerful the datasets are quite different in terms of image capture
computational functions that are accelerated with a GPU’s angle. The dataset chosen for this study, from Analytics
processing power. YOLOv5 version 6.1 was released on Vidhya, contains much closer images of various horizontal
February 22, 2022, which featured the YOLOv5n model for profiles with five distinct classifications, namely, “Cargo,”
ultralight edge devices. At the time of this report, the latest “Military,” “Carrier,” “Cruise,” and “Tankers.” Using YOLOv3,
version, 6.1, was utilized. The YOLOv5s model was used due Liu et al. (11) took steps to improve the algorithm’s accuracy
to the constraints of using free cloud-based tools to facilitate of large pixel dense satellite images by reducing the original
our training and detection, as described later in the section networks 32 times down-sampling to four times, as well as
“Methodology.” using a sliding window method to cut down large images to
This study is organized as follows. The section “Literature many smaller images.
Review” reviews works related to the use of machine learning The structure of YOLOv5 can be split into input,
techniques on images of maritime ships, specifically with the backbone, neck, and prediction. Some research has been
YOLO family of algorithms and similar machine learning done on creating new backbones, improving existing
algorithms. The section “Methodology” describes the dataset backbones (12), or swapping YOLOv5’s backbone for another
and methods of pre-processing, annotation, training, and existing backbone. Ting et al. (13) swapped the exiting
validation. The section “Results and Discussion” discusses backbone for Hauawei’s GhostNet (14) and stacked two
the results and the key metrics of mAP, precision, recall, of these GhostNets into what they call a Ghostbottlenet.
AUC, and F1 scores. Finally, the section “Conclusion and Using the Ghostbottlenet instead of YOLOv5’s original
Future Work” discusses the conclusions and future ideas. backbone, they were able to improve feature extraction
In the literature review, we will compare and contrast and reduce the overall model size. Zhou et al. (15) also
previous works with this study and determine the uniqueness replaced the original backbone with Mixed Receptive Field
of the dataset and methodology, as well as increase the Convolution (MixConv), where MixConv makes use of
fundamental knowledge on the subject of object detection multiple convolution kernels to improve feature extraction
and classification. It should be noted that the demand for by increasing attention to pixel coordinates in horizontal and
advanced video surveillance and perception capability has vertical channels. Qiao et al. (16) attempted to re-identify
been requested by the United States Department of Defense maritime vessels that the model has already seen even at
(DoD). The DoD has set aside millions of dollars to procure other orientations, using a Global-and-Local Fusion-based
and field innovative technologies from non-traditional Multi-view Feature Learning by replacing the backbone with
vendors, making this research in-demand and valuable. ResNet-50 for global and local feature extraction.
Many studies discuss machine learning in the application Orientation recognition is the focus of another study
of ship classification (8), but to date, none of these works where the researchers use a single shot detector (SSD) for
address the problem with the off-the-shelf application of both multiclass vessel detection and five defined orientations
YOLOv5 and the dataset from Analytics Vidhya. The (i.e., front, front side, side, backside, and back) (17, 18). SSD
uniqueness of this study is that it applies YOLOv5 to a large is a feedforward ConvNet that explores the presence of an
dataset consisting of various types of ships, in addition to the object instance in the predefined default bounding boxes,
varying quality of images from multiple viewpoints. followed by a non-maximum suppression stage to produce
the final detection. Tang et al. (19) explored the use of an SSD
with hue, saturation, and value pre-processing to improve the
Literature review Intersection Over Union. The hue, saturation, and value pre-
processing operation are used to extract regions of interest to
Research has been done for ship classification using feed to the YOLO network.
alternative algorithms and methods and different datasets. There are other studies in the same field that cross
Kim et al. (9) used a different dataset that includes images and compare different object detection algorithms, such as Faster
is focused on the improvement of a preexisting classification R-CNN(20), R-FCN (21), SSD, and EfficientDet (22), while
with different defining subclasses of ships including “Boat,” still attempting to detect maritime vessels. Iancu et al. (23)
“Speed Boat,” “Vessel Ship,” “Ferry,” “Kayak,” “Buoy,” “Sail found that in small to medium size objects greater than 162
Boat,” and “Others.” They achieved a mean average precision pixels, Faster R-CNN with Inception-Resnet v2 outperforms
(mAP) (0.5) value of 0.898 and an mAP (0.5:0.95) value the others except in detecting large objects where EfficientDet
of 0.528. Li et al. (8) presented a combination of real-time does a better job (23). It is interesting to note that all of the
ship classification, Ship Detection from Visual Image, and convolutional neural network (CNN) based detectors were
YOLOv3 to achieve an mAP value of 0.741. also pre-trained on the COCO (24) dataset, similar to YOLO,
Tang et al. (10) compared different versions of YOLO for which is also a CNN (23).
datasets of Synthetic Aperture Radar images and traditional The COCO (24) dataset by Microsoft houses 3,30,000
satellite camera images of ships. While similar in nature images and 1.5 million object instances, and 80 object
10.54646/bijscit.2023.32 9

categories. The richly annotated dataset contains objects


in their natural context and depicts complex everyday
scenes. The dataset focuses on segmenting individual object
instances rather than what other object recognition datasets
support, such as image classification, object localization, or
segmentation. Since one of the 80 object categories is “boat,”
we can utilize transfer learning for our five sub-classifications
of “boat” since YOLOv5 is pretrained on COCO.

Methodology
Pre-processing

Before this study gets into the explanation of the


implemented methods, it would be beneficial to give
some background on the dataset and some design decisions FIGURE 1 | Roll-on-roll-off ship classification example.
and difficulties found while taking the first steps with
the dataset. This dataset of maritime vessels, provided by
Analytics Vidhya (1, 2), came with over 8,000 images that Training an object detection model
were already ground truth labeled for the classification of
“Cargo,” “Military,” “Carrier,” “Cruise,” and “Tanker.” While To train a model on the data, a cloud environment, Google
the images did have a ground truth label, they did not Colab (25), was utilized. Google Colab offers free GPU
have individual bounding boxes for labeled objects inside runtime and leverages a python-based Jupyter notebook.
From this environment, you can clone the public Github
the image. Specifically, it was found that images of cruise
repository for YOLOv5 and install all the necessary python
ships would often have two or more cruise ships in the
package requirements.
frame of the image.
Roboflow’s export to a Jupyter notebook is seamless. There
Cargo ship was renamed Container ship, and Carrier ship
are two options for online and offline imports to a project
was renamed “RORO,” which stands for Roll-On-Roll-Off
notebook. For an offline configuration, the images and
ships. ROROs have a very particular shape and size, as shown annotations can be downloaded and placed in a notebook.
in Figure 1. Through the internet, Roboflow’s API can be used in a
Labels were renamed to distinctly identify the difference few lines of auto-generated code to bring in the dataset.
between a “Carrier” and “Cargo.” Other difficulties with this Python, the programming language of choice for data science
dataset were similarities in the images of container ships engineers, comes loaded with tools for transforming results
and tankers. While annotating and drawing bounding boxes into usable graphs and using the trained YOLOv5 model in
by hand progressed, it became challenging to determine the action to produce images with classification bounding boxes
difference between the two, specifically when a container ship and confidence levels as shown in Figure 2.
was empty and did not have any containers loaded on the In the first attempt at training a YOLOv5 model, we passed
deck. Some tankers had piping and other mechanical features the following arguments to the [Link] script:
on the deck, while some had a flat deck and looked like an
empty cargo container ship. python [Link] -img 416 -batch 16 -epochs 150 -
The data were separated into a training set, validation data {[Link]}/[Link] -weights [Link]
-cache
set, and testing set, to be used by YOLOv5. Roboflow was
utilized to assist the project in pre-processing the data. The “img” flag defines the size in pixels of the input images
Using Roboflow, there were 1,500 annotated images, dividing by length and width. The “batch” flag determines the batch
the classifications into 300 annotations per classification. size, that is, how much can be loaded into memory; this is
Controlling the number of annotations evenly boosted the dependent on the hardware.
performance of training YOLOv5. Other Roboflow pre- In total, 16 was chosen as a recommended default, but the
processing steps were taken to orient any flipped or rotated YOLOv5 documentation warns not to use small batch sizes
images automatically and also to resize the images to 416 as they can produce poor batchnorm statistics. The “epochs”
pixels by 416 pixels. flag is the number of complete passes the training dataset
10 Brown et al.

a simple image of one ship with that ship being the focus
of the image. The second type of image is a single image
containing multiple same classifications of the ship. The third
type of image is a single image containing numerous vessels
of different classifications. The fourth type is an image of a
ship, but it is not the main focus of the photograph. The ship
may be in the background or blend into the environment
more so than in other photographs.
The dataset also contains images both in color and black
and white, as well as blurry images and clear images.
Including all of these types of images in the dataset ensures
that it challenges the machine learning model to predict
classifications that are not inherently obvious and pushes the
limits further of how well the model can predict given less
than ideal images.

FIGURE 2 | Container ship classification example.


Object detection

As mentioned in the section “Methodology,” the machine


makes through the algorithm; 150 was a recommended learning model was run through Google Colab using
default. The number of epochs is a crucial parameter to YOLOv5 and Python in a Jupyter notebook. The model
tweak to look for overfitting or underfitting. Overfitting ran 100 epochs and produced images in which it made
negatively impacts the model’s performance and ability to predictions on classifications of ships in multiple images.
generalize new data. Some of these images were classified correctly, and some
The “data” flag is the file location of the training data. of them were not. In a few of the predictions, there were
The “weights” flag is the small COCO pre-trained checkpoint various ships within the images for added complexity to
of [Link]. This checkpoint was selected to take the model. Figures 1–7 show examples of the algorithm’s
into consideration the cloud-based training environment output. The model performs differently on different images
duration requirement of free use for less than 24◦ h. Lastly, for object detection. It is possible to correctly classify an
the “cache” flag was used to cache images for faster training. object, misclassify a ship, or classify background objects as
a ship, and it can be possible to miss the detection entirely.
Figure 3 shows the model successfully classifying multiple
Post-processing ships within the same image of type “Cruise Ship,” so it is
understood that the model has the capability to identify more
Our first attempt at training a custom YOLOv5 model was than one object per image.
with some recommendations for the parameters of [Link]. Figures 2–4 all show successful classifications of three
After viewing some of the charts produced by Tensor- different types of ships with images of varying quality. It
Flow, it was determined that our model was overfitting. can be seen that Figures 2, 3 have a higher confidence in
An undesirable positive slope occurred at the end of the classification than Figure 4, likely due to the poor image
bounding box regression loss graph (Mean Squared Error) quality of Figure 4. However, it is important to note that
or box_loss graph. To correct this overfitting problem, the the model can detect an object even when the image
epochs were lowered from 150 to 100 to reduce the number quality is not ideal.
of times the training data goes through the algorithm. Figure 5 is an example of the model missing the detection
entirely. It can be seen that there are two cruise ships in the
image, but the model identifies only one.
Results and discussion Figure 6 shows another capability of the model in that
it can identify overlapping ships. In this image, the tanker
Dataset quality was located directly in the foreground of the container ship,
and the model was still able to detect both ships successfully.
The dataset was obtained from Analytics Vidhya for the Likely the container ship suffers low confidence due to the
Game of Deep Learning: Computer Vision Hackathon (1, 2). model not observing the bow and stern of the container ship,
It was essential to gather a wide variety of photos with a few which are very distinct features.
key factors to test the performance of the machine learning Finally, Figure 7 shows an example of the model
model. The first type of image contained in the dataset is misclassifying a background object and a ship. In this image,
10.54646/bijscit.2023.32 11

FIGURE 3 | Cruise ship classification example. FIGURE 5 | Missing classification example.

FIGURE 6 | Overlay classification example.

FIGURE 4 | Military ship classification example.


Post-processing

As described in the “Post-processing” subsection of the


the model confused a bridge for a tanker, most likely due “Methodology” section, the method of rectifying the model’s
apparent overfitting observed in the box_loss graph was to
to the large towers of the bridge resembling a mast erected
reduce the epochs from 150 to 100. As seen in Figure 8, the
vertically above the hull of a tanker. The false positive of
data for the box_loss appears to be trending in a healthy
the large bridge could be mitigated by adding additional manner and does not indicate overfitting, as seen in the
images containing bridges and maritime ships that do not previous training attempt.
have bounding boxes containing the bridge. There are a few The phenomenon of overfitting occurs when the model has
aircraft carriers mixed into the dataset, but adding more been trained for too long, becomes too specific to the training
would improve precision. set, and performs poorly with new data. An ideal learning
12 Brown et al.

TABLE 1 | Validation summary results.

Class Images Labels P R [email protected] [email protected]:


0.95

All 292 322 0.892 0.902 0.929 0.598


Container 292 69 0.845 0.844 0.920 0.567
Ship Cruise 292 59 0.871 0.913 0.902 0.568
Ship Military 292 65 0.921 0.938 0.961 0.583
Ship RORO 292 60 0.993 1 0.995 0.740
Tanker 292 69 0.830 0.777 0.870 0.532

FIGURE 7 | Misclassification example.

FIGURE 9 | Precision-recall chart.

Precision is the measure of identifying relevant objects;


FIGURE 8 | Validation box loss chart.
recall is the measure of the truth within the bounding boxes.
The mAP metric is the mean of the AP metrics. An AP
metric is the area under the precision-recall curve, shown in
curve graph is identified when training and validation loss
Figure 9. This calculation is shown in Equation (1):
decreases to the point of stability.
X
AP = [Recalls (k) − Recalls (k + 1)] ∗ Precisions (k)
(1)
Performance The larger the area under the curve (AUC), the higher the
AP, which indicates a better model. Our mAP metric is
The concept of algorithm performance is interpreted as the higher at the 0.5 Intersection Over Union level, known as the
quality of results the machine learning model produced. traditional level, meaning that when there is half overlap over
For this project’s scope, the focus will be on the the union area of the two bounding boxes, the model has an
model’s performance to accurately determine the correct accuracy of 92.9% as seen in Table 1.
classification of ships. To do this, a variety of key The four other main charts are shown in Figures 9–
performance indicators are used. The most important of 12. These figures visualize the key performance indicators.
these indicators is the mAP metric. However, there are an To better understand these key performance indicators and
additional four other metrics worth discussing as a part of interpret the results, it is crucial to first discuss the confusion
the algorithm results. matrix in Figure 13 and the four key metrics used in the
The mAP metric achieved for this project was 0.929 at the calculation of the key performance indicators. The four key
0.5 level and 0.598 at the 0.95 level. The mAP metric relies on factors used in these equations (with example definitions in
the precision, recall, precision-recall curve, and intersection terms of a RORO classification) are as follows:
over union, which is discussed in detail later in the paper.
To give a brief definition of each, intersection over union is • True Positive (TP): The ship was classified as a RORO
the measure of how much the bounding boxes on the object and it was actually a RORO.
overlap. This value will be 1.0 for exactly lined up or 0.0 if • True Negative (TN): The ship was classified as not a
there is no overlap at all. RORO and it was actually not a RORO.
10.54646/bijscit.2023.32 13

FIGURE 10 | Precision chart.

FIGURE 13 | Confusion matrix.

While the concept of these metrics is essential to


the indicators below, the confusion matrix will help to
understand the performance of the machine learning
model, how the model has made the predictions, and
the area containing the most prevalent errors of the
model predictions.
The confusion matrix in Figure 13 shows the true
classification compared to the predicted classification. It can
FIGURE 11 | Recall chart. visualize where the model is getting confused in classifying
or differentiating between two different classifications. This is
visualized through a two-by-two matrix with one axis being
the actual truth, or ground truth, and the other axis being
the prediction, or the model’s truth. In a perfect situation,
1.00 would be seen across the diagonal from the top left
to the bottom right of the matrix. However, the model is
not perfect. The model gets pretty close to perfect as the
correct classification percentage for each category of ship is
as follows:

• Container ship: 86%


• Cruise ship: 93%
• Military ship: 98%
• RORO: 98%
• Tanker: 72%

In addition to showing the percentage of the algorithm


correctly classified, the breakdown of incorrect classifications
can be seen as well. The most common misclassification
FIGURE 12 | F1 curve chart. occurred when the algorithm mistook the background of the
image for a cruise ship. Confusion between classifications of
ships arose the most between incorrectly classifying tankers
• False Positive (FP): The ship was classified as a RORO as container ships 20% of the time.
and it was not actually a RORO. One of the performance indicators is precision, shown in
• False Negative (FN): The ship was classified as not a Figure 10. Precision measures the accuracy of the predictions
RORO and it actually was a RORO. or the percentage of the predictions that are correct. The
14 Brown et al.

precision can be calculated by dividing the number of


true positives by the summation of true positives and false
positives. This calculation is shown in Equation (2):
TP
(2)
TP + FP
The higher the precision, the more accurate the model was FIGURE 14 | Validation CLS loss chart.
at predicting the correct classification of vessels. Figure 10
shows precision over confidence. In this case, confidence is
a value between 0.0 and 1.0, which indicates how confident
the model is that the prediction is correct. In general, a
confidence of 0.7 and above is strong, between 0.3 and
0.7 is okay, and below 0.3 is weak and probably not a
good prediction.
Figure 10 shows that as the confidence grows, the
precision grows as well. It is interesting that the precision
FIGURE 15 | Validation object loss chart.
grows logarithmically and not linearly with confidence. It
starts with a precision of around 20% when the confidence
is 0.0, then rockets to approximately 75% by a confidence of Generally, an F1 score above 0.9 is excellent, between 0.8 and
0.1. It is evident from this chart that the tanker classification 0.9 is good, between 0.5 and 0.8 is okay, and below 0.5 is not
contains the most variability and is the most incorrectly good. In this study, the F1 score is highest at a confidence of
classified vessel in this model because its precision over 0.433 with an F1 score of 0.89.
confidence has the most drastic spikes at a high confidence This score rating is just bordering the excellent range, but
value while the other classifications remain somewhat stable. remains in the good range. One of the factors for this score is
The second performance indicator is recall, shown in class balancing, in which the goal is to have an even amount
Figure 11. Recall is an indicator of how well all the positives or fairly distributed dataset of the different classification types
(or correct classifications) are identified. This indicator is contained in the data, and a large amount of them for more
calculated by dividing the number of true positives by the accurate learning.
summation of true positives and false negatives.
According to Figure 11, the recall decreases as the
confidence increases. This calculation is shown in Equation Validation
(3):
The goal of validation is to validate the model’s performance
TP
(3) by how well it correctly predicted the correct classification
TP + FN
of the ship and that the object is accurately detected. There
The third performance indicator is the precision-recall rate, can be, and are, instances where the model identifies a
shown in Figure 9. A precision-recall (PR) curve plots the background image as some classification of a vessel or
precision values on the y-axis and the recall values on classifies a ship incorrectly. Figures 8, 14, 15 represent the
the x-axis for each classification in the model. Precision is performance of the validation set. In each of these figures,
measured by Eq. (2), and recall is measured by Eq. (3). it is important to look at the dark orange line, which is
It would be ideal to see that the algorithm has both high the validation, compared to the faded orange line, which is
recall and high precision, but in most cases, this is not the training data.
possible, so there is usually a trade-off between the two The validation box loss chart in Figure 8 shows the mean
in machine learning algorithms. To analyze this specific square error (MSE) of the validation data vs. training data.
graph, it is known that a good machine learning model MSE depicts how close the regression line is to a set of points
produces a high AUC. by calculating the distance from the points to the regression
In Figure 9, it can be seen that the RORO classification line and squaring the error. You can observe from the graph
has the greatest AUC with a PR value of 0.995. The that the MSE consistently declines and never trends in the
tanker classification had the lowest AUC with a PR value positive direction. Box loss shows how well the predicted box
of 0.855. Overall, the results show that on average, of all overlaps with the validation bounding box.
classes, the AUC is significant with a PR value of 0.932, Figure 14 shows the classification loss or cross-entropy.
indicating a good result. Entropy measures the average amount of information needed
Finally, the fourth performance indicator is the F1 score, to represent a random event drawn from a probability for
shown in Figure 12. The F1 score combines the precision and a random variable. Cross-entropy is the measure of the
recall metrics and is a measure of the accuracy of the dataset. difference between two probability distributions of random
10.54646/bijscit.2023.32 15

sets and can be used as a loss function when optimizing to the datasets. The algorithm would benefit from annotating
classification models. Figure 14 shows the validation of images where a container ship may be empty and the same for
the model to classify the object given a set of possible tankers and various stages depending on cargo load. Other
classifications correctly. Lastly, in Figure 15, the object loss future work that could be useful to commercial and military
chart shows binary cross entropy or the ability of the model customers is estimating the distance and bearing of the target
to detect an object of interest or not accurately. after properly detecting and classifying an object of interest.
In each of these charts, it would be evident if the model was We plan to investigate improving our results by increasing
overfitting or underfitting. If overfitting, the algorithm would the number of hand-annotated images in our training and
do well on the training dataset but poorly on validation data. validation datasets. Increasing the number of images per
This scenario would visually be identifiable by the validation classification will also further refine our results. There is also
line being consistently above the training data line even news of future versions of YOLOv6 and YOLOv7 that could
though the loss on the training data is low. be utilized, as well as changing to larger pre-trained weights
Underfitting occurs when the algorithm performs poorly to compare results. With the growing fleet of commercial
not only on the validation data, but the training data as well. and military ships, there is demand for research like this
It is visually identifiable by looking at the validation charts study, and we think the future will use cameras and machine
if both validation and training data lines are separated, and learning as a passive perception system for maritime ships.
the training line is above the validation line. This may mean
that the algorithm is too complex for the given dataset in
comparison to underfitting, where the model is not complex
enough for the given dataset. All three of our validation
Author contributions
charts show healthy loss functions. Table 1 shows training
SB and CH prepared the dataset by annotating the
and validation results, showing precision, recall, and mAP.
images and also worked on applying the machine learning
The standard for comparing object detectors is mAP, and
algorithms, and wrote the initial draft of the manuscript.
for our classifications, we are pleased with the results and
RG and SB provided the guidance and supervised the whole
will compare them with other related projects in the section
project. SB and CH. RG provided comments and SB did the
“Conclusion.”
overseeing and final edits of the manuscript to bring it to
publication form. All authors contributed to the article and
approved the submitted version.
Conclusion and future work
With the methods chosen, using a public dataset, annotating
images from the dataset with Roboflow, and training an Conflict of interest
off-the-shelf machine learning algorithm YOLOv5 in a
cloud-based environment, the application of commercial and The authors declare that the research was conducted in the
military passive perception of maritime ships is achievable. absence of any commercial or financial relationships that
Object detection and classification of maritime ships have could be construed as a potential conflict of interest.
many options of machine learning algorithms but our results
prove that YOLOv5 is a competitive CNN, as indicated by our
mAP value of 0.929 and healthy validation curves presented References
in the section “Validation.”
We found that our custom-trained model using the 1. Analytics Vidhya. Game of Deep Learning: Computer Vision Hackathon.
Analytics Vidhya dataset performed better in terms of mAP (n.d.). Available online at: [Link]
contest/game-of-deep-learning/ (accessed June 5, 2022).
of different Intersection Over Union thresholds from 0.5
2. Kaggle. Game of Deep Learning: Ship Datasets. (n.d.). Available online at:
to 0.95 in 0.05 increment steps compared to the related [Link]
works of Kim et al. (9), where their model resulted in a ship-datasets (accessed June 5, 2022).
[email protected]:0.95 of 0.528 and our model performed at 0.598. 3. Roboflow Inc. Roboflow Annotate [Computer Software]. Des Moines, IA:
Another comparison is of the related works of Li et al. (8) Roboflow Inc (2020).
using YOLOv3 to achieve an mAP value of 0.741, and our 4. Ultralytics. GitHub - ultralytics/yolov5: YOLOv5. (n.d.). Available online
custom YOLOv5 model produced an mAP value of 0.929. at: [Link] (accessed June 5, 2022).
Iancu et al. (23) at best produced an mAP value of 55.48% 5. Roboflow Inc. YOLOv5 New Version Explained [May 2022]. (n.d.).
Available online at: [Link]
while cross-comparing four other competitors to YOLOv5 and-evaluation/ (accessed June 12, 2022).
compared to our model’s mAP of 92.9%. 6. Ultralytics. GitHub - ultralytics/yolov3: YOLOv3. (n.d.). Available online
In future work, to rectify the misclassifications of cargo at: [Link] (accessed June 12, 2022).
container ships and tankers with similar features, additional 7. PyTorch. GitHub - pytorch/pytorch: PyTorchv1.11. (n.d.). Available
images with varying characteristics would have to be added online at: [Link] (accessed June 12, 2022).
16 Brown et al.

8. Li H, Deng L, Yang C, Liu J, Gu Z. Enhanced YOLO v3 tiny network


for real-time ship detection from visual image. IEEE Access Pract Innov 17. Ghahremani A, Kong Y, Bondarev E, de With PHN. Multi-
Open Solut. (2021) 9:16692–706. doi: 10.1109/ACCESS.2021.3053956 class detection and orientation recognition of vessels in maritime
surveillance. Econ Inst. (2019) 31:266–261. doi: 10.2352/ISSN.2470-
9. Kim J-H, Kim N, Park Y, Won CS. Object detection and classification
[Link]-266
based on YOLO-V5 with improved maritime dataset. J Math Sci Educ.
(2022) 10:377. doi: 10.3390/jmse10030377 18. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, et al. SSD:
10. Tang G, Zhuge Y, Claramunt C, Men S. N-YOLO: a SAR ship detection single shot multibox detector. ArXiv [Preprint] (2015):doi: 10.48550/
using noise-classifying and complete-target extraction. Remote Sens. arxiv.1512.02325
(2021) 13:871. doi: 10.3390/rs13050871 19. Tang G, Liu S, Fujino I, Claramunt C, Wang Y, Men S, et al. H-
11. Liu R, Wang T, Zhou Y, Wang C, Shan G, Snoussi H, et al. A satellite YOLO: a single-shot ship detection approach based on region of
image target detection model based on an improved single-stage target interest preselected network. Remote Sens. (2020) 12:4192. doi: 10.3390/
detection network. Proceedings of the 2019 Chinese Automation Congress rs12244192
(CAC). Hangzhou: (2019). p. 4931–6. doi: 10.1109/CAC48633.2019. 20. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object
8997495 detection with region proposal networks. IEEE Trans Pattern Anal Mach
12. Zhang X, Yan M, Zhu D, Guan Y. Marine ship detection and Intellig. (2017) 39:1137–49. doi: 10.1109/TPAMI.2016.2577031
classification based on YOLOv5 model. J Phys Conf Ser. (2022) 21. Dai J, Li Y, He K, Sun J. R-FCN: object detection via region
2181:012025. doi: 10.1088/1742-6596/2181/1/012025 based fully convolutional networks. ArXiv [Preprint] (2016):doi: 10.485
13. Ting L, Baijun Z, Yongsheng Z, Shun Y. Ship detection algorithm 50/arxiv.1605.06409
based on improved YOLO V5. Proceedings of the 2021 6th International
22. Tan M, Pang R, Le Q. EfficientDet: scalable and efficient object
Conference on Automation, Control and Robotics Engineering (CACRE),
detection. Proceedings of the 2020 IEEE/CVF Conference on Computer
Shanghai, China, September 23-25, 2022. Shanghai: (2021). p. 483–7.
Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 13-19,
doi: 10.1109/CACRE52464.2021.9501331
2020. Seattle, WA: 10778-10787 (2020). doi: 10.1109/CVPR42600.2020.0
14. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C, et al. GhostNet: more 1079
features from cheap operations. Proceedings of the 2020 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, 23. Iancu B, Soloviev V, Zelioli L, Lilius J. ABOships—An inshore
WA, USA, June 13-19, 2020. Seattle, WA: (2020). p. 1577–86. doi: 10. and offshore maritime vessel detection dataset with precise
48550/arXiv.1911.11907 annotations. Remote Sens. (2021) 13:988. doi: 10.3390/rs130
50988
15. Zhou S, Yin J. YOLO-Ship: a visible light ship detection method.
Proceedings of the 2022 2nd International Conference on Consumer 24. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al.
Electronics and Computer Engineering (ICCECE), January 14-16, 2022, Microsoft COCO: common objects in context. In: Fleet D, Pajdla T,
Guangzhou, China. Guangzhou: (2022). p. 113–8. doi: 10.1109/ICCE Schiele B, Tuytelaars T editors. Proceedings of the European Conference
CE54139.2022.9712768 on Computer Vision (ECCV), Zurich, Switzerland, September 6-12, 2014.
16. Qiao D, Liu G, Dong F, Jiang S-X, Dai L. Marine vessel re-identification: (Vol. 8693), Zurich: (2014). p. 740–55. doi: 10.1007/978-3-319-10602-
a large-scale dataset and global-and-local fusion based discriminative 1\_48
feature learning. IEEE Access Pract Innov Open Solut. (2020) 8:27744–56. 25. Google. GoogleColaboratory(Version2022/5/20)[Computer software].
doi: 10.1109/ACCESS.2020.2969231 Mountain View, CA: Google (2019).

You might also like