0% found this document useful (0 votes)

90 views6 pages

Deep Learning for Shark Detection

This document discusses using deep learning models for shark detection from beach images collected by UAVs. It evaluates several state-of-the-art object detection models (Faster R-CNN, Mask R-CNN, FPN, RetinaNet) on a dataset of shark and non-shark objects collected off the coast of Southern California. The experiments show these deep learning models can efficiently detect sharks and other objects in sparse, unevenly illuminated beach images.

Uploaded by

Felipe Neira Díaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views6 pages

Deep Learning for Shark Detection

Uploaded by

Felipe Neira Díaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep Learning for Shark Detection Tasks

Wenlu Zhang∗ , Xinyi Chen∗ , Dhara Bhadani∗ , Patrick Rex† , Yu Yang‡ , Christopher G. Lowe† , Hen-Geul Yeh§
∗ Department of Computer Engineering and Computer Science, California State University Long Beach, CA, 90840
† Department of Biological Sciences, California State University Long Beach, CA, 90840
‡ Department of Chemical Engineering, California State University Long Beach, CA, 90840
2021 IEEE Green Energy and Smart Systems Conference (IGESSC) | 978-1-6654-3456-0/21/$31.00 ©2021 IEEE | DOI: 10.1109/IGESSC53124.2021.9618703

§ Department of Electrical Engineering, California State University Long Beach, CA, 90840

Abstract—Automatic detection of free-ranging sharks from in object detection modeling design. The most state-of-the-
beach areas is of great importance in maintaining a safe human- art object detection can be divided into two major categories:
shark interaction. The task is especially challenging due to most region proposed based detection models and classification and
existing shark detection methods and the sparsity features of
field images collected from Unmanned Aerial Vehicle (UAV). bounding box regression based detection models [21], [22]. In
Recently, deep learning has been tremendously successful in this paper, we mainly focus on region proposed based Convo-
various real-world applications such as automatic driving system, lutional Neural Networks including Faster R-CNN [23], Mask
object detection, face recognition, medical diagnosis, etc. In this R-CNN [24] and RetinaNet [25], because these mentioned
paper, we propose an automated pipeline of shark detection tasks. methods can handle imbalanced classes, image illumination
In specific, we implement several state-of-the-art object detection
models into our shark field data set. These algorithms are Faster and sparsity challenges.
R-CNN, Mask R-CNN, Feature Pyramid Network (FPN) and II. M ATERIALS AND M ETHOD
RetinaNet. We report the quantitative comparison results on the
above mentioned object detection models and we also provide A. Field Data Collection
some detection example images. The experiments show that the Survey Protocol Small Unmanned Aerial Vehicles (sUAVs)
models are capable of making a fast and efficient detection among
shark and non-shark objects. were used to conduct video surveys of the southern Califor-
Index Terms—Object Detection, Shark Recognition, Deep nia coast between Point Conception, California (34.4486◦ N ,
Learning, Convolutional Neural Network 120.4716◦ W ) and San Diego, California (32.7157◦ N ,
117.1611◦ W ) from January 2019 to December 2020. Specific
I. I NTRODUCTION beaches where large aggregations of juvenile white sharks
(Carcharodon carcharias) were present were selected to in-
Automatically identifying and detecting the activities of crease the probability of shark observations. To ensure that
free-ranging sharks play an essential role in maintaining a data was collected under a wide range of environmental
healthy marine ecosystem and reducing the risk to public conditions, days when surveys were performed were selected
safety for beachgoers [1]. The recent enhanced technical de- semi-haphazardly. The sUAV was flown between 5.5 and 6.0
velopment of UAVs provides a new opportunity for managing m/s along a 1 km stretch of coastline, following the specific
the human-shark interaction with low cost [2]–[5]. Currently, contour of each beach. Altitude of the sUAV varied from
most of the existing researches related with shark detection 30 m to 120 m, resulting in variation in pixel silhouettes
have combine UAVs and machine learning techniques [6]– of subjects in the frame. The sUAV was positioned so that
[8]. However, these methods failed to give a full investigation the shoreline and the outside of the wave break were within
about state-of-the-art object detection models. the same frame of the camera at all times. This ensured that
From the perspective of computer vision, classical object all human subjects using the shoreline for recreation would
detection methods can be usually involved into the following be encompassed within the first transect of the survey. If no
three major stages: 1. They mainly used the multi-scale sliding juvenile white sharks were observed during the first 1 km
window by scanning the whole image to get the informative transect of the survey, the pilot flew the drone offshore 75
selection of the region proposal. 2. They usually used the hand- m and then returned to a position parallel to the start of the
crafted feature extractions such as SIFT [9], HOG [10], etc. 3. survey. This would repeat until a shark was spotted or until
A shallow machine learning classifier has been used to make the pilot performed a transect 500 m offshore, in which case
prediction, such as Support Vector Machine [11]. However, the survey would end. If a shark was observed, it would be
the computation cost of the traditional models is high and may tracked by positioning the sUAV directly above the central
not produce robust features due to the nature of hand-crafted point of the shark and following the shark for the remaining
feature extraction. battery life of the sUAV. Survey duration ranged from 16 to
Recently, deep learning has made significant gains in broad 22 min.
range of models such as Convolutional Neural Network Image Selection For Analysis Video surveys were filmed
(CNN) [12]–[14], Recurrent Neural Network (RNN) [15]– at 4k resolution (3840 × 2160) at 30 frames per second using
[18], Transformer [19], and Generative Adversarial Network the stock, onboard camera of the Phantom 4 Pro v2.0 (Da-
(GAN) [20]. Deep learning also made tremendous success Jiang Innovations) sUAV was used to collect data. Images were

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on June 06,2023 at 14:20:37 UTC from IEEE Xplore. Restrictions apply.
selected from the video using VLC media player (VideoLan) a “top-down” pyramidal series of layers is created via up-
during post-survey review of the video surveys. Instances sampling to simulate a higher resolution for layers of higher
where humans and sharks were in the same frame of the semantic value. To this end, the convolution outputs from
camera were prioritized for analysis. However, images where the first pyramid are laterally connected to the corresponding
only humans or sharks were within the frame were also layer on the top-down pathway-pyramid in a manner similar
analyzed. Images with varying environmental conditions were to ResNet [31]. At each of these merged layers, a 3 × 3
selected to ensure we were training the algorithm with a range convolution layer is applied to reduce the aliasing effect of
of light levels, glare, wind waves, and water clarity. Images upsampling. That output is the final feature map for the usage
were only selected if all subjects in the frame were clearly of object detection at that layer’s specific scale.
visible. For example, images where sharks were too deep in Mask R-CNN [24] is an extention of Faster R-CNN. It
the water column to fully articulate their silhouette via labeling adds a branch for predicting instance segmentation mask on
software, or where humans were obscured by broken waves each Region of Interest (RoI), in parallel with existing branch
were not analyzed. for classification and bounding box regression. In addition to
two outputs of Faster R-CNN, Mask R-CNN adds a third
B. Method output, a binary mask for each RoI. But this additional mask
In this section, we explore several state-of-the-art region output requires an extraction of finer spatial layout of an
proposal based object detection models. R-CNN [26] is a object. So Mask R-CNN introduces a simple, quantization-free
two-stage detection algorithm. Regions with CNN features. layer, called RoIAlign that can preserve spatial information
The first stage identifies region proposals in an image that of different scales objects. During the training procedure,
may contain object. The second stage classifies object in each Mask R-CNN used multi-task loss on each RoI as L =
region. R-CNN detector first generates region proposals using Lcls + Lbox + Lmask [24]. Here, classification loss, Lcls and
external region proposal methods such as Selective Search or regression loss, Lbox are identical as those of Fast R-CNN.
Edge Boxes. Then CNN extracts a fixed-length vector (feature Lmask is defined as average binary cross-entropy loss for every
map) from each region. Finally, region bounding boxes are re- pixel sigmoid on ground-truth class mask. Mask classifier
fined by SVM using feature map generated by CNN. However, predicts m×m mask for each RoI to retain spatial dimensions.
training with R-CNN is expensive and object detection is slow RoIPool, a key operation of Faster R-CNN, performs coarse
at testing time. Fast region-based Convolutional Network (Fast spatial quantization for feature extraction which introduced
R-CNN) [27] solves these issues. The approach is similar to R- misalignment. This may not impact classification, but it neg-
CNN, but instead of feeding each region proposals to CNN, an atively affects a pixel-accurate mask prediction. RoIAlign
input image with region proposals is fed to CNN to generate resolve this issues by replacing harsh quantization of RoIPool
the feature map. From feature map, each region proposal is with binary interpolation, computing the exact values of the
wrapped into squares and reshaped into fixed size feature input features. RoIAlign had made significant improvement
vector by using Region of Interest (RoI) pooling layer. RoI for mask accuracy. Therefore, Mask R-CNN is simple, flexible
feature vector is used to predict the label of proposed region and fast system for instance segmentation and object detection.
and bounding box. Fast R-CNN is much more efficient than RetinaNet [25] is a simple one-stage detector that utilizes
R-CNN as the convolutional computations for overlapping a novel loss function (i.e., focal loss function) to address
regions are shared. the class imbalance problem during training. Class imbalance
Even Fast R-CNN achieves some promising experimental is a common problem during training, in which the number
results, it still heavily relies on external region proposal of locations that don’t contain objects (negative locations)
models to construct region proposals. Region proposals are the dramatically surpasses the number of locations that contain
bottleneck in preventing the efficiency of detection system. To objects (positive locations). The vast amount of negative
Solve this problem, Ren et al. [23] introduced Region Proposal locations may overwhelm the model and lead to degenerate
Network (RPN). RPN shares convolutional layers with object models. Recent two-stage detector models address this issue
detection network. On top of these layers, RPN has a few by filtering out most negative locations in the first stage [23],
additional convolutional layers that can regress bounding box [28], [32], [33]. Correspondingly, the speed of these detectors
and object score at each rectangle region. Unlike Selective is compromised. Rather than using two-stages, RetinaNet uses
Search [28] or Edge Boxes [29], RPN is a Fully Convolutional a modulating factor in the loss function to dynamically adjust
Network (FCN), designed as training end-to-end for generating the scaling factor of cross-entropy loss, which down-weights
region proposals. Therefore, Faster R-CNN can be trained end- the contribution of easily classified negative locations and
to-end by back propagation and stochastic gradient descent highlights the contribution of positive locations.
(SGD) to learn shared features.
III. E XPERIMENTAL R ESULTS AND E VALUATION
Feature Pyramid Network (FPN) [30] introduced a multi-
scale architecture for a feature extractor, for the usage of A. Experimental Setup
combination with other independent object detection archi- Our provided shark detection data set contains total 1241
tectures. A series of convolutional layers creates a feature images with the size of 3840 × 2160. The data set has
pyramid with different receptive fields, a “bottom-up”. Then been initially considered as multi-class, multi-scale and sparse

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on June 06,2023 at 14:20:37 UTC from IEEE Xplore. Restrictions apply.
(a) Original shark image

(a) Original shark image

(b) RetinaNet detection result

Fig. 2: Shark Example Image One

individual model for 500 epochs and inference only costs

(b) RetinaNet detection result around 300 milliseconds to process one single image.
Detectorn2 library was developed by Facebook AI Research
Fig. 1: Example image from Microsoft COCO (FAIR) that aims to provide state-of-the-art instance and
semantic segmentation and object detection models. Due to
issue of the small amount of our shark images, we make use
learning problem since it includes multiple classes such as of the Detectron2 Model Zoo to fine-tune on our own specific
shark, wader, surfer, wader, body-boarding, stand-up paddle- data using pre-trained models. In Detectron2 Model Zoo, they
boarding (SUP), multiple objects have different scales in the used Microsoft Common Object in Context (COCO) data
same image and each image is quite sparse since only small set [36] to train the models. COCO data set consists of 330,000
area has labeled objects. Also the non-shark classes such as images with more than 200,000 of them are labeled. There
wader, surfer, wade, body-boarding and SUP only contain are 80 objects categories among all images. The resolution
small amounts of images. In order to resolve the imbalance of the images are 640x480. In Detectron2, they used images
data issue, we decide to treat it as binary class problem, which from train2017 and val2017 to train and validate all the pre-
is only the shark and non-shark object detection. In the end, trained model. There are more than 500,000 object instances
we have about 1/3 images containing shark objects and 2/3 segmented from the data set they are using. In Fig.1, we select
containing non-shark objects. We finally randomly split the one example image from COCO, and from the RetinaNet
whole data into a training set of 803 images, a validation detection result, we can easily visualize that multiple objects
set of 109 images and a testing set of 329 images. During such as horse, person, umbrella, etc have been successfully
the training process, the images have been augmented such identified and detected with bounding boxes and probabilities.
as flipping horizontally, vertically, rotating clockwise by 90
degrees. In addition, the image hue has been tuned by some B. Experimental Results and Discussions
number from -30 to 30. We finally randomly shift and re-scale In the experimental evaluation part, we decide to choose
each image by some value from -10% to 10%. Average Precision (AP) metric to represent the average AP of
All the models have been implemented using pre-trained low to high Intersection of Union (IoU) thresholds ranged from
models from Detectron2 [34] and PyTorch 1.1.0 [35] deep 0.5 to 0.95 with increment of 0.05. Both AP50 and AP75 have
learning tool on a workstation with the following configura- also been represented the AP score when the IoU threshold is
tion: One NVIDIA RTX 2060 SUPER 8 GB GPU, Intel i7- at 0.5 and 0.75. AP-Shark and AP-NoShark are to represent
10700K 8 Core 3.8GHz, and Windows 10 operating system. the AP score that has been calculated by the two different
In our experiment, it takes about 3 minutes to fine-tune each categories. Average Recall (AR) metric is the average recall

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on June 06,2023 at 14:20:37 UTC from IEEE Xplore. Restrictions apply.
(a) Original shark image (a) Original standup paddleboarding image

(b) RetinaNet detection result

(b) RetinaNet detection result
Fig. 3: Shark Example Image Two
Fig. 5: Standup Paddleboarding Example Image

Both the AP and AR score calculation is provided by Detec-

tron2 that is specially designed for COCO-detection evalua-
tion.
In the experiment, we make use of the several backbone
models from Detectron2, for example, R-50 is from MSRA
original ResNet-50 layers model [31], R-101 is also from
MSRA original ResNet-101 layers model [31]. For Faster and
Mask R-CNN, Detectron2 has provided us baseline with three
different backbone combinations, which are ResNet with FPN,
(a) Original body-boarding image ResNet with conv4 layer and ResNet with dilated conv5 layer.
The experimental results of all the major three object detection
models have been listed in Table I, II and III. RetinaNet
achieves the best performance in the measurement of AP,
AP50, AP75, AP-shark, AP-NoShark and recall. However, due
to the sparse nature of provided images and limited number of
training images, the effectiveness and efficiency of RetinaNet
performance is still limited.

TABLE I: Comparison of Experimental Results Faster R-CNN

Model AP(%) AP50(%) AP75(%) AP-Shark(%) AP-NoShark(%) AR(%) Time(s/img)
(b) RetinaNet detection result
R50-C4 17.835 47.354 7.640 25.697 9.972 35.3 0.245

Fig. 4: Body-boarding Example Image R50-DC5 14.656 42.711 6.384 23.671 5.641 34.3 0.139
R50-FPN 26.958 63.111 13.180 30.417 23.498 44.1 0.081
R101-FPN 26.389 62.835 11.469 32.399 20.379 43.7 0.105

for the whole training process, the formula is as follows, where

We also provide some example detection results for easy
TP term represents True positive and FN term represents False
visualization. In fig. 2, the shark object has been successfully
negative:
detected with high probability. In fig. 3, the hue of objective
TP shark is quite closed to the background, but our detection
recall = model is still able to handle illumination challenge. In fig. 4, 5
TP + FN

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on June 06,2023 at 14:20:37 UTC from IEEE Xplore. Restrictions apply.
(a) Wave image without label
(a) Original wader image

(b) RetinaNet Detection Result

(b) RetinaNet detection result Fig. 7: Non-shark object falsely detected

Fig. 6: Wader Example Image TABLE III: Comparison of Experimental Results RetinaNet
TABLE II: Comparison of Experimental Results Mask R-CNN Model AP(%) AP50(%) AP75(%) AP-Shark(%) AP-NoShark(%) AR(%) Time(s/img)
R50-FPN 36.004 70.549 33.233 51.374 20.634 43.9 0.083
Model AP(%) AP50(%) AP75(%) AP-Shark(%) AP-NoShark(%) AR(%) Time(s/img) R101-FPN 38.390 73.219 34.438 52.521 24.259 50.5 0.110
R50-C4 16.384 45.871 6.285 20.588 12.180 35.1 0.463
R50-DC5 20.331 56.286 10.462 28.527 12.134 37.5 0.428
R50-FPN 23.932 60.747 12.878 26.984 20.897 43.0 0.251
R101-FPN 25.078 65.492 11.354 27.577 22.578 42.6 0.219
R EFERENCES
[1] P. Simmons and M. I. Mehmet, “Shark management strategy policy
considerations: community preferences, reasoning and speculations,”
and 6, these images only contain the non-shark objects such as Marine Policy, vol. 96, pp. 111–119, 2018.
body-boarding, Stand-up paddle-boarding (SUP) and wader, [2] G. Shrivakshan and C. Chandrasekar, “A comparison of various edge
RetinaNet can accurately detect each non-shark object with detection techniques used in image processing,” International Journal
of Computer Science Issues (IJCSI), vol. 9, no. 5, p. 269, 2012.
bounding boxes using different sizes of region proposals.
[3] J. C. van Gemert, C. R. Verschoor, P. Mettes, K. Epema, L. P. Koh,
However, due to the sparsity nature of images and limited and S. Wich, “Nature conservation drones for automatic localization
number of training set, we list one mistake prediction exam- and counting of animals,” in European Conference on Computer Vision.
ple implemented by RetinaNet. In fig. 7, the model falsely Springer, 2014, pp. 255–270.
[4] L. F. Gonzalez, G. A. Montes, E. Puig, S. Johnson, K. Mengersen, and
identifies the wave as non-shark object. K. J. Gaston, “Unmanned aerial vehicles (uavs) and artificial intelligence
revolutionizing wildlife monitoring and conservation,” Sensors, vol. 16,
IV. C ONCLUSION AND F UTURE W ORK no. 1, p. 97, 2016.
In this work, we apply deep learning networks on automatic [5] B. Kane, C. A. Zajchowski, T. R. Allen, G. McLeod, and N. H.
Allen, “Is it safer at the beach? spatial and temporal analyses of
shark object detection application. We implement the major beachgoer behaviors during the covid-19 pandemic,” Ocean & Coastal
three Deep Learning architectures based on Resnet and region Management, vol. 205, p. 105533, 2021.
object detection models. In particular, we utilize Feature [6] N. Sharma, P. Scully-Power, and M. Blumenstein, “Shark detection from
aerial imagery using region-based cnn, a study,” in Australasian Joint
Pyramid Network (RPN) and focal loss function to solve the Conference on Artificial Intelligence. Springer, 2018, pp. 224–236.
imbalanced shark detection problem. In the future, we plan [7] R. Gorkin, K. Adams, M. J. Berryman, S. Aubin, W. Li, A. R. Davis, and
to implement You Only Look Once (YOLO) network system J. Barthelemy, “Sharkeye: real-time autonomous personal shark alerting
via aerial surveillance,” Drones, vol. 4, no. 2, p. 18, 2020.
and Single-shot multibox detector (SSD) to further improve [8] A. P. Colefax, B. P. Kelaher, D. E. Pagendam, and P. A. Butcher,
the performance of automatic shark object detection tasks. “Assessing white shark (carcharodon carcharias) behavior along coastal
beaches for conservation-focused shark mitigation,” Frontiers in Marine
ACKNOWLEDGMENT Science, vol. 7, p. 268, 2020.
[9] P. C. Ng and S. Henikoff, “Sift: Predicting amino acid changes that
We would like to acknowledge Adrian Campos and affect protein function,” Nucleic acids research, vol. 31, no. 13, pp.
Bernardo Cobos for engineering setup and discussion. 3812–3814, 2003.

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on June 06,2023 at 14:20:37 UTC from IEEE Xplore. Restrictions apply.
[10] X. Wang, T. X. Han, and S. Yan, “An hog-lbp human detector with imperative style, high-performance deep learning library,” arXiv preprint
partial occlusion handling,” in 2009 IEEE 12th international conference arXiv:1912.01703, 2019.
on computer vision. IEEE, 2009, pp. 32–39. [36] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
[11] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in
vol. 20, no. 3, pp. 273–297, 1995. context,” in European conference on computer vision. Springer, 2014,
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning pp. 740–755.
applied to document recognition,” Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
[14] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9.
[15] J. F. Kolen and S. C. Kremer, A field guide to dynamical recurrent
networks. John Wiley & Sons, 2001.
[16] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training
recurrent neural networks,” in International conference on machine
learning, 2013, pp. 1310–1318.
[17] ——, “Understanding the exploding gradient problem,” CoRR,
abs/1211.5063, vol. 2, p. 417, 2012.
[18] A. Karpathy, J. Johnson, and L. Fei-Fei, “Visualizing and understanding
recurrent networks,” arXiv preprint arXiv:1506.02078, 2015.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint
arXiv:1706.03762, 2017.
[20] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,”
arXiv preprint arXiv:1406.2661, 2014.
[21] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016, pp. 779–
788.
[22] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
Berg, “Ssd: Single shot multibox detector,” in European conference on
computer vision. Springer, 2016, pp. 21–37.
[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-
time object detection with region proposal networks,” arXiv preprint
arXiv:1506.01497, 2015.
[24] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in
Proceedings of the IEEE international conference on computer vision,
2017, pp. 2961–2969.
[25] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss
for dense object detection,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 2980–2988.
[26] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2014, pp. 580–587.
[27] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international
conference on computer vision, 2015, pp. 1440–1448.
[28] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders,
“Selective search for object recognition,” International journal of com-
puter vision, vol. 104, no. 2, pp. 154–171, 2013.
[29] C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from
edges,” in European conference on computer vision. Springer, 2014,
pp. 391–405.
[30] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature pyramid networks for object detection,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2017, pp.
2117–2125.
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[32] P. O. Pinheiro, R. Collobert, and P. Dollár, “Learning to segment object
candidates,” arXiv preprint arXiv:1506.06204, 2015.
[33] P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollár, “Learning to refine
object segments,” in European conference on computer vision. Springer,
2016, pp. 75–91.
[34] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2,”
https://github.com/facebookresearch/detectron2, 2019.
[35] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An

Authorized licensed use limited to: Universidad Nacional de Colombia (UNAL). Downloaded on June 06,2023 at 14:20:37 UTC from IEEE Xplore. Restrictions apply.

Automated Cetacean Detection in UAV Imagery Using AI Models: A Case Study On Delphinid Species
No ratings yet
Automated Cetacean Detection in UAV Imagery Using AI Models: A Case Study On Delphinid Species
15 pages
Major Internal (1) - 3
No ratings yet
Major Internal (1) - 3
56 pages
Batch-3 (2) - 1
No ratings yet
Batch-3 (2) - 1
48 pages
Real-Time Detection of Hammerhead Sharks
No ratings yet
Real-Time Detection of Hammerhead Sharks
6 pages
Marine Species Research Paper2
No ratings yet
Marine Species Research Paper2
4 pages
Underwater Object Detection AI
No ratings yet
Underwater Object Detection AI
16 pages
Unlocking The Potential of Deep Learning For Marin
No ratings yet
Unlocking The Potential of Deep Learning For Marin
44 pages
Deep Neural Networks For Marine Debris Detection in Sonar Images
No ratings yet
Deep Neural Networks For Marine Debris Detection in Sonar Images
241 pages
Gray 2018 MEE
No ratings yet
Gray 2018 MEE
11 pages
Ship Accident Prevention System Using Python
No ratings yet
Ship Accident Prevention System Using Python
13 pages
C-6830 Pop
No ratings yet
C-6830 Pop
11 pages
Huang 2019
No ratings yet
Huang 2019
13 pages
Brackish Water Marine Animal Dataset
No ratings yet
Brackish Water Marine Animal Dataset
9 pages
Thesis - Deep Learning For Detection
No ratings yet
Thesis - Deep Learning For Detection
68 pages
SeaDronesSee: UAV Maritime Human Detection
No ratings yet
SeaDronesSee: UAV Maritime Human Detection
11 pages
5-Jul-11093 Paper
No ratings yet
5-Jul-11093 Paper
5 pages
10 1016@j Ecoinf 2019 05 004
No ratings yet
10 1016@j Ecoinf 2019 05 004
19 pages
Unmanned Aircraft System For Maritime Operations Automatic Detection Subsystem
No ratings yet
Unmanned Aircraft System For Maritime Operations Automatic Detection Subsystem
12 pages
Marine Object Detection Advances
No ratings yet
Marine Object Detection Advances
11 pages
02 Fin Whale Tracking
No ratings yet
02 Fin Whale Tracking
39 pages
Report For ML
No ratings yet
Report For ML
6 pages
Detecting Marine Organisms Via Joint Attention-Relation Learning For Marine Video Surveillance
No ratings yet
Detecting Marine Organisms Via Joint Attention-Relation Learning For Marine Video Surveillance
16 pages
Underwater Drone Advancements for SAR
No ratings yet
Underwater Drone Advancements for SAR
34 pages
Wahlig and Gonzales 2024 Enhancing Marine Debris
No ratings yet
Wahlig and Gonzales 2024 Enhancing Marine Debris
6 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
8 pages
24NCIIT104
No ratings yet
24NCIIT104
5 pages
1st Workshop On Maritime Computer Vision MaCVi 2023 Challenge Results
No ratings yet
1st Workshop On Maritime Computer Vision MaCVi 2023 Challenge Results
38 pages
(15200426 - Journal of Atmospheric and Oceanic Technology) Automatic Classification of Biological Targets in A Tidal Channel Using A Multibeam Sonar
No ratings yet
(15200426 - Journal of Atmospheric and Oceanic Technology) Automatic Classification of Biological Targets in A Tidal Channel Using A Multibeam Sonar
19 pages
Jmse 12 00055 v2
No ratings yet
Jmse 12 00055 v2
18 pages
Journal Pone 0263377
No ratings yet
Journal Pone 0263377
23 pages
A Review of Deep Learning Methods and Applications For PDF
No ratings yet
A Review of Deep Learning Methods and Applications For PDF
14 pages
Automated Detection, Classification and Counting of Fish in Fish Passages With Deep Learning
No ratings yet
Automated Detection, Classification and Counting of Fish in Fish Passages With Deep Learning
15 pages
Fish and Fisheries - 2022 - Saleh - Computer Vision and Deep Learning For Fish Classification in Underwater Habitats A
No ratings yet
Fish and Fisheries - 2022 - Saleh - Computer Vision and Deep Learning For Fish Classification in Underwater Habitats A
23 pages
Monitoring The Ocean Environment Using Robotic Sys - 250416 - 013438
No ratings yet
Monitoring The Ocean Environment Using Robotic Sys - 250416 - 013438
19 pages
IoT Boat for Marine Debris Removal
No ratings yet
IoT Boat for Marine Debris Removal
13 pages
Sensors 20 00726 With Cover
No ratings yet
Sensors 20 00726 With Cover
26 pages
Spatial-Spectral Image Classification With Edge Preserving Method
No ratings yet
Spatial-Spectral Image Classification With Edge Preserving Method
9 pages
Animal Detection Using Deep Learning
No ratings yet
Animal Detection Using Deep Learning
4 pages
Peerj Cs 2033
No ratings yet
Peerj Cs 2033
15 pages
A17 6022 WWW
No ratings yet
A17 6022 WWW
6 pages
Underwater Fish Detection With Weak Multi-Domain Supervision
No ratings yet
Underwater Fish Detection With Weak Multi-Domain Supervision
8 pages
Remotesensing 14 04487 v2
No ratings yet
Remotesensing 14 04487 v2
18 pages
Deep Learning UAV Review
No ratings yet
Deep Learning UAV Review
21 pages
Remotesensing 13 00965 v2
No ratings yet
Remotesensing 13 00965 v2
18 pages
Drones 07 00682
No ratings yet
Drones 07 00682
18 pages
Jmse 11 00572
No ratings yet
Jmse 11 00572
16 pages
Electronics 14 00201
No ratings yet
Electronics 14 00201
20 pages
1 s2.0 S1574954124003844 Main
No ratings yet
1 s2.0 S1574954124003844 Main
22 pages
tmpFE47 TMP
No ratings yet
tmpFE47 TMP
10 pages
Hierarchical Deep Learning Models For Identification of Fish Species
No ratings yet
Hierarchical Deep Learning Models For Identification of Fish Species
5 pages
Fast Animal Detection in Uav Images Using Convolutional Neural Networks
No ratings yet
Fast Animal Detection in Uav Images Using Convolutional Neural Networks
4 pages
First Review PPT Major Project
No ratings yet
First Review PPT Major Project
24 pages
Drone-Based Solar Panel Cleaner
No ratings yet
Drone-Based Solar Panel Cleaner
32 pages
Sensors: Unmanned Aerial Vehicle Based Wireless Sensor Network For Marine-Coastal Environment Monitoring
No ratings yet
Sensors: Unmanned Aerial Vehicle Based Wireless Sensor Network For Marine-Coastal Environment Monitoring
22 pages
Fish Recognition with Faster R-CNN
No ratings yet
Fish Recognition with Faster R-CNN
11 pages
PLOS ONE - Elephant Seal Drone and Citizen Science
No ratings yet
PLOS ONE - Elephant Seal Drone and Citizen Science
14 pages
Marine Robotics 4 0 Present and Future of Real Time 3rk7wjyv
No ratings yet
Marine Robotics 4 0 Present and Future of Real Time 3rk7wjyv
20 pages
Sustainability 16 00454
No ratings yet
Sustainability 16 00454
20 pages
Title Page Ieeetr
No ratings yet
Title Page Ieeetr
1 page
Problemas de Traducción Segundo Ejercicio en Clase - Journal of Environmental Management
No ratings yet
Problemas de Traducción Segundo Ejercicio en Clase - Journal of Environmental Management
2 pages
Effects of High Frequency Repeated Transcranial FFJ
No ratings yet
Effects of High Frequency Repeated Transcranial FFJ
2 pages
OG300 Owners Manual (USA)
No ratings yet
OG300 Owners Manual (USA)
8 pages
Evolution of Japanese Cinema Styles
No ratings yet
Evolution of Japanese Cinema Styles
105 pages
Republic of The Philippines Isabela State University Echague, Isabela
No ratings yet
Republic of The Philippines Isabela State University Echague, Isabela
11 pages
13.2.6 - Exact Equations and Integrating Factors
No ratings yet
13.2.6 - Exact Equations and Integrating Factors
17 pages
Mbe35 - 50 Operation Manual Completo
No ratings yet
Mbe35 - 50 Operation Manual Completo
58 pages
CSAT Notes - Qs On One Formula
No ratings yet
CSAT Notes - Qs On One Formula
2 pages
4 Exp (05) - Uniformly Accelerated Motion (Lab. Report)
No ratings yet
4 Exp (05) - Uniformly Accelerated Motion (Lab. Report)
4 pages
Sysmex White Paper Differential Diagnosis of Thrombocytopenia
No ratings yet
Sysmex White Paper Differential Diagnosis of Thrombocytopenia
5 pages
Oil Tanker Design & Systems
No ratings yet
Oil Tanker Design & Systems
21 pages
Modbus TCP and Its Client-Server Model and MQTT and Its Publish-Subscribe Model PDF
No ratings yet
Modbus TCP and Its Client-Server Model and MQTT and Its Publish-Subscribe Model PDF
8 pages
Python Tkinter
No ratings yet
Python Tkinter
43 pages
Exercise 7 - Using Hive To Access Hadoop-Hbase Data
No ratings yet
Exercise 7 - Using Hive To Access Hadoop-Hbase Data
10 pages
Fun Toss Game for Kids' Math Skills
No ratings yet
Fun Toss Game for Kids' Math Skills
2 pages
Exer 3
No ratings yet
Exer 3
2 pages
Earthworks and Pavement Guide
No ratings yet
Earthworks and Pavement Guide
5 pages
VHDL
100% (1)
VHDL
97 pages
Altivar 31 Manual
No ratings yet
Altivar 31 Manual
94 pages
Tamil Nadu - District Estimates
No ratings yet
Tamil Nadu - District Estimates
1 page
Harr Clinical Chemistry Flashcards Quizlet
No ratings yet
Harr Clinical Chemistry Flashcards Quizlet
44 pages
Learneverythingai 1691463808
No ratings yet
Learneverythingai 1691463808
8 pages
E2EG Series Proximity Sensors Overview
No ratings yet
E2EG Series Proximity Sensors Overview
29 pages
Review: Hawking's Last Book Insights
No ratings yet
Review: Hawking's Last Book Insights
2 pages
1st BACHILLER REVIEW FUTURE TENSES
No ratings yet
1st BACHILLER REVIEW FUTURE TENSES
2 pages
Fuse SM6
No ratings yet
Fuse SM6
2 pages
Essays in Critical Realism
100% (5)
Essays in Critical Realism
264 pages
Celebrating Mathematics Day: Ramanujan's Legacy
No ratings yet
Celebrating Mathematics Day: Ramanujan's Legacy
37 pages
Basic Maths Paper 0002 (Wavy Curve Modulus Test 2 - WA)
No ratings yet
Basic Maths Paper 0002 (Wavy Curve Modulus Test 2 - WA)
5 pages
Sulfur Guard Process Specs
100% (1)
Sulfur Guard Process Specs
5 pages
Understanding Emotive Meaning in Ethics
No ratings yet
Understanding Emotive Meaning in Ethics
2 pages
Open Source GPGPU Design for Researchers
No ratings yet
Open Source GPGPU Design for Researchers
1 page
Tda 18273 HN
100% (2)
Tda 18273 HN
52 pages
Mobile Computing
No ratings yet
Mobile Computing
126 pages

Deep Learning for Shark Detection

Uploaded by

Deep Learning for Shark Detection

Uploaded by

Deep Learning for Shark Detection Tasks

(a) Original shark image

(b) RetinaNet detection result

individual model for 500 epochs and inference only costs

(b) RetinaNet detection result

Both the AP and AR score calculation is provided by Detec-

TABLE I: Comparison of Experimental Results Faster R-CNN

for the whole training process, the formula is as follows, where

(b) RetinaNet Detection Result

(b) RetinaNet detection result Fig. 7: Non-shark object falsely detected

You might also like