Underwater Fish Species Classification using Convolutional Neural Network
and Deep Learning
Dhruv Rathi*, Sushant Jain**, Dr. S. Indu***
* Department of Computer Engineering, Delhi Technological University, New Delhi, India
Phone: +919868904222, E-Mail:
[email protected], http://dhruvrathi.me/
** Department of Computer Engineering, Delhi Technological University, New Delhi, India
E-Mail:
[email protected] *** Department of Electronics and Communication, Delhi Technological University, New Delhi, India E-
Mail:
[email protected], http://www.dtu.ac.in/Web/Departments/Electronics/faculty/sindu.php
Abstract— The target of this paper is to recommend a way for overlap, segmentation error and occlusion [6]. Also, the
Automated classification of Fish species. A high accuracy fish complex environment restricts simpler approaches like
classification is required for greater understanding of fish luminance, and background subtraction includes issues such
behavior in Ichthyology and by marine biologists. Maintaining as colors shifting, inconsistent lighting and presence of
a ledger of the number of fishes per species and marking the sediments in water and undulating underwater plants.
endangered species in large and small water bodies is required Fish species recognition is a multi-class classification
by concerned institutions. Majority of available methods focus problem and is a compelling research field of machine learning
on classification of fishes outside of water because underwater and computer vision. The state-of-the-art algorithms
classification poses challenges such as background noises,
implemented over individual input images which perform
distortion of images, the presence of other water bodies in
classification mainly using shape and texture feature extraction
images, image quality and occlusion. This method uses a novel
technique based on Convolutional Neural Networks, Deep and matching [7][8]. All the existing work either deals with a
Learning and Image Processing to achieve an accuracy of small dataset distinguishing between less number of species or
96.29%. This method ensures considerably discrimination has a low accuracy. Our proposed method uses Convolutional
accuracy improvements than the previously proposed methods. Neural Networks which makes the process simpler and more
robust even while working with a large dataset. CNNs are also
Keywords- Fish Species Classification, Deep Learning, much more flexible and can adapt to the new incoming data as
Convolutional Neural Network, Morphological Operations, Otsu the dataset matures. We make use of the fish dataset from the
binarization, Otsu Thresholding, Pyramid Mean Shifting, Fish4Knowledge project [24] for testing our algorithm. We
Computer Vision perform the classification by pre-processing the images using
Gaussian Blurring, Morphological Operations, Otsu’s
I. INTRODUCTION Thresholding and Pyramid Mean Shifting, further feeding the
Monitoring the behavior of different species of fishes is enhanced images to a Convolutional Neural Network for
of primary importance for getting insights into a marine classification.
ecological system. The count and distribution of the various The remaining paper is catalogued as follows: Section II.
species of fishes can give valuable insights about the health reviews the current research done in this field; Section III.
of the ecological system and can be used as a parameter for delineates the proposed algorithm; Section IV. gives the
monitoring environmental changes. Visual classifying of experimental results; Section V. discusses the Future Works
fishes can also help trace their movement and give patterns and Conclusion and Section VI. gives the References.
and trends in their activities providing a deeper 1knowledge
about the species as a whole. II. RELATED WORKS
The study of the behavior of the fishes can be automated There are numerous approaches proposed by researchers
by getting visual feedback from multiple locations and for the classification of fish species as delineated below:
automating the process of visual classification of fishes, C. Spampianto et al. [7] attempt to classify fishes using
which will give significantly larger amounts of data for texture features extracted from the gabor filtering, gray level
pattern recognition. Although, there have been many histogram and the histogram of Fourier descriptors of
advances in classifying fish taken out of water [1][2][3][4] boundaries and Curvature Scale Space Transform used for
or in artificial conditions, such as in tanks with adequate shape features extracted using. The algorithm was tested on
lighting [5], there has been no significant breakthrough in a dataset of 360 images and achieved an accuracy of 92%.
the classification of fishes in datasets created from
Takakazu Ishimatsu et al. [8] used two identification
underwater videos. The challenges faced during underwater
classification of fish species include noise, distortion, features: speckle patterns and scale forms of fish and used
morphological algorithms and filters for discrimination. They
just showed the differentiation between three species-Pilchard Yi-Haur Shiau et al. [19] proposed a method of sparse
Sardine (ma-iwashi), Japanese Horse Mackerel (ma-aji), and representation-based classification for the recognition and
Common Mackerel (ma-saba) with an accuracy of 90%, 88% verification of fishes which maximizes the probability of
and 90% respectively. Their method is dependent on the size partial rankings thus obtained. They worked on a dataset of
and shape of the morphological filters used. 1000 images and achieved the highest accuracy of 81.8%
for a particular feature space dimensionality.
Junguk Cho et al. [9] used Haar classifiers with the Scythe
Butterfly fish as their test species. Their method is heavily S.O. Ogunlana et al. [20] classified using Support Vector
dependent on the background environment in the image of the Machine technique based on the shape features of the fish.
fish and the angle from which the photograph is taken. They worked on a training data of 76 fish and testing data of
74 fish achieving an accuracy of 78.59%.
Andrew Rova et al. [10] attempt to classify fishes using
the method of warping the images prior to classification S. Cadieux et al. [21], generates the contours of fish in
using SVM on a dataset of 320 images achieving an an unconstrained environment by deploying an infrared
accuracy of 90%. silhouette sensor which acquires contours of fish in a
constrained flow. When the inputs are noisy, these features
Chomptip Pornpanomchai et al. [11] proposed and give a poor performance. The system has reported
developed a fish recognition system based on shape and classification accuracy around 78%.
texture. They compared Artificial Neural Networks and
Euclidean Distance Method (EDM) working on a test set of D. J. Lee et al. [22], removed edge noise and redundant
300 images and a training set of 600 images. They achieved data points through the development of a shape analysis
an accuracy of 81.67% and 99.00% with EDM and ANN algorithm. Critical landmark points were located using an
respectively. algorithm of curvature function analysis. A group of nine
fish species was used to test this method. The dataset that
Rodrigues et al. [13] recommended an algorithm based on was used to perform the experimentation consisted of only
SIFT feature extraction & Principal Component Analysis 22 images.
(PCA) but they worked on a very small dataset of 162 images
encompassing 6 different species getting an accuracy of 92%. M. Nery et al. [23], proposed a methodology based on
S. Sclaroff et al. [14] performed object deformable shape feature selection. This method develops a feature vector by
detection and object detection was done through model- utilizing a set of descriptors obtained by analysis of the
based region grouping. Computational complexity is the characteristic contribution of an individual descriptor to the
major drawback of this method. overall performance for classification. A classification
accuracy of about 85% is reported.
C. Spampinato et al. [15] worked on 20 underwater videos
III. PROPOSED METHODOLOGY
to detect, track and count fishes with an accuracy of 85%. They
performed detection using Moving Average Detection Here, we present a methodology for the discrimination
Algorithm and Adaptive Gaussian Mixture Model. of fish species. The dataset used for the concerned work is
Andres Hernandez-Serna et al. [16] used Artificial taken from [24]. The initial step taken by the system aims at
Neural Networks for the automatic identification of species. removing the noise in the dataset. Application of Image
They worked on a dataset of 697 images achieving an Processing before the training step helps to remove the
accuracy of 91.65%. underwater obstacles, dirt and non-fish bodies in the images.
The second step uses Deep Learning approach by
Arjun Kumar Joginpelly et al. [17] propose an automatic implementation of Convolutional Neural Networks(CNN)
technique using Gabor filters to extract important features for the classification of the Fish Species.
from two species, Epinephelus morio and Ocyurus
chrysurus. The proposed algorithm is tested on 200 frames, In order to get the best results for feature identification and
each containing many fish and non-fish regions. The training of the CNN, it is important to provide input image
accuracy is 70.6% for Epinephelus morio and 80.3% for with enhanced features as training sample. The pre-
Ocyurus chrysurus. processing consists of the following steps,
Deokjin Joo et al. [18] extracted stripe and color patterns
of wild cichlids and used Random Forests and SVM for
classification achieving an accuracy of 72% on a dataset of
594 wild cichlids. They have a low accuracy and they just
target Cichlid fishes.
With the implementation of Otsu’s thresholding, a grey
level histogram is created from the grayscale image for
noise removal, if a pixel of the grayscale is greater than the
threshold value, it is considered to be white, else declared as
black. The image provides a sure foreground with the fish in
focus.
Figure 2b. Graphical flow of layers showing sharing of Weights.
Next step deals with the implementation of Morphological
Operations, viz-a-viz, Erosion and Dilation of the binarized
Fig (2b) represents three hidden units. The weights of
image.
similar color are shared, thus are inferred to be identical.
In the Erosion of the image, a kernel, precisely a fixed size The summation of the gradients of the parameters that are
matrix is convolved over the image. A pixel in the processed being shared results in the gradient of the shared weight.
image will be taken as 1 only if all the pixels under the Such similarity thus allows detection of features regardless
kernel are 1, otherwise, the pixel is eroded (0). of their positions in the visual field. In addition to this,
Thus, in this step, the thickness of the foreground weight sharing tends to decrease the number of free learning
object(fish) decreases. parameters. Due to this control, CNN tends to achieve better
The step of Dilation implements the algorithm that a pixel in generalization on vision problems.
the processed image is 1 if at least one pixel under the The Max-pooling layers act as non-linear down sampling, in
kernel is 1. which the input image is partitioned into non-overlapping
Thus, this step is used to join the broken parts of the image rectangles. The output of each sub-region is the maximum
from the noise removal step. value.
The Convolution Layer is the first layer of the CNN
network. The structure of this layer is shown in fig (3). It
consists of a convolution mask, bias terms and a function
expression. Together, these generate the output of the layer.
The figure below shows a 5x5x4 mask that performs
convolution over a 100x100x4 input feature map. Upon
Figure 1. Pre-processing with Otsu’s binarization, Dilation and Erosion. application of 32 such 5x5x4 filters, the resultant output is a
96x96x32 matrix.
The Second step of the procedure is the implementation of a
Convolutional Neural Network(CNN) for classification of
Fish species.
The input layer of the network takes the 100x100x3 original
RGB image stacked with the 100x100x1 image which is the
output of the pre-processing stage, thus making the input of
100x100x4, the fully-connected layer where we get the
trained output and the intermediate hidden layers. The
network has a series of convolutional and pooling layers.
Neurons in layer say, ‘m’ are connected to a subset of
neurons from the previous layer of (m-1), where the (m-1)
layered neurons have contiguous receptive fields, as shown
in Fig(2a).
Figure 3. Processing the input feature with 32 filters and max-pooling.
Figure 2a. Graphical flow of layers showing the connection between layers.
The next layer in the network is a subsampling layer. The
Subsampling layer is designed to have the same number of
planes as the convolution layer. The purpose of this layer is
to reduce the size of the feature map. It divides the image into
blocks of 5x5 and performs max-pooling. Sub-sampling The loss function used after the fully-connected layer is
layer preserves the relative information between features Cross-entropy, Mathematically,
and not the exact relation.
Hy′(y):=−∑i(y′ilog(yi)+(1−y′i)log(1−yi))
Figure 3 shows how the input features are processed with 32
filters and max-pooling. The above process is repeated two
times again once with 64 filters and then with 32 filters. which can be explained as a (minus) log-likelihood for the
The final output is connected to a fully connected layer data y′i, under a model yi.
which is further connected to an 80% dropout layer and
lastly another fully connected layer which classifies the TABLE 1. ALGORITHM FOR PRE-PROCESSING AND CONVOLUTIONAL NETWORK:
images into appropriate categories.
The aim of the training algorithm is to train a network such
that the error is minimized between the network output and
the desired output.
In the proposed method, we provide the comparison of
different Activation Functions that will be applied to the
different Layers in the CNN. The following are the three
Activation functions, namely,
a) ReLU b) Sigmoid c) tanh
Mathematical definition of the above functions explained
below:
a) ReLU : Rectified Linear Unit is :
ReLU: h = max (0, a), where a = W*x+b
IV. EXPERIMENTAL RESULTS
The proposed method was tested in Python on the
dataset Fish4Knowledge [24] of 27,142 images shown in
Figure 3 below.
b) Softmax:
σ(z) = (ez / ∑kk=1 (ezk) )
c) tanh: Mathematical definition states,
tanh = (e+x + e-x) / (e-x - e+x)
Activation Function
Overall Accuracy
Used
ReLU 96.29%
tanh 72.62%
Softmax 61.91%
TABLE 3. RESULTS
Figure 2. Dataset used
Table 2 provides insights into the number of samples
used for each different species covered. Table 3 shows the V. CONCLUSION AND FUTURE WORK
results as the accuracy of the correctly predicted test images
The proposed method of the classification of fish species
in the sample as given by Eqn. (5) shown below:
gives an accuracy of 96.29% which is very high compared
with the other current implemented methods used for this
application. Hence the proposed approach can certainly be
- (5) used for real time applications as the computation time is
0.00183 seconds per frame. The method couldn’t achieve
100% accuracy as some images couldn’t be classified
accurately due to the effect of background noise and other
water bodies. We plan to improvise our algorithm further by
Species Number of Images implementing Image Enhancement techniques to counter for
Plectroglyphidodon dickii 11312 the lost features in the images.
Chromis chrysura 2683
Amphirion clarkii 3593
Chaetodon lunulatus 4049 VI. REFERENCES
Chaetodon trifascialls 2534 [1] White, D. J., Svellingen, C., Strachan, N. J. C., Automated
measurement of species and length of fish by computer
Myripristis kuntee 190 vision, Elsevier, 2006.
Acanthurus nigrofuscus 450 [2] F. Storbeck and B. Daan, Fish species recognition using
computer vision and a neural network, Elsevier, 2000.
Hemigymnus fasciatus 218 [3] Nery, M. S., Machado, A. M., Campos, M. F. M., Padua, F. L.
Chaetodon trifascialis 242 C., Carceroni, R., Queiroz-Neto, J. P, Determining the
appropriate feature set for fish classification tasks, Proceeding
Neoniphon sammara 298 of the XVIII Brazilian Symposium on Computer Graphics and
Image Processing, 2005.
Abudefduf vaigiensis 198 [4] M. C.Chuang, J. N.Hwang, C. Rose, Aggregated
Canthigster valentini 148 Segmentation of Fish from Conveyor Belt Videos, IEEE
International Conference on Acoustics, Speech and Signal
Pomocentrus molucensis 180 Processing (ICASSP), 1807-1811, 2013.
[5] N. Castignolles, M. Catteon, M. Larinier, Identification and
Hemigymnus fasciatus 190 counting of live fish by image analysis, SPIE. Vol 2182,
Scolopsis billineate 142 Image and Video Processing II, 1994.
[6] K.A. Mutasem, B.O. Khairuddin, N. Shahrulazman and A.
Neoniphon sammara 206 Ibrahim (2010). “Fish Recognition Based on Robust Features
Extraction from Size and Shape Measurements Using Neural
Scaridae 149 Network”. Journal Computer Science, Volume 6, Issue 10
Pemphereis vanicolensis 156 [7] C. Spampinato, D. Giordano, R. D. Salvo, Y-H C-Burger,
R.B. Fisher, G. Nadarajan, Automatic Fish Classification for
Zanclus cornutus 129 Underwater Species Behaviour Understanding
Balistapus undulatus 221 [8] Y. Nagashima, Takakazu Ishimatsu, A Morphological
Approach to Fish Discrimination, IAPR Workshop on
Zebrasoma scopas 116 Machine Vision APPLICATIONS, Nov. 17-19,1998
Total 27,142 [9] B. Benson, J. Cho, D. Goshorn, R. Kastner, Field
Programmable Gate Array (FPGA) Based Fish Detection
Table 2. The Species used in the Dataset Using Haar Classifiers, American Academy of Underwater
Sciences, March 1, 2009.
[10] A. Rova, G. Mori, L. M. Dill, One fish, two fish, butterfish,
trumpeter: Recognising fish in underwater videos, IAPR
Conference on Machine Vision Applications
[11] C. Pornpanomchai, B. Lurstwut, P. Leerasakultham, W. [18] Joo D1, Kwan YS, Song J, Pinho C, Hey J, Won YJ, Identification of
Kitiyanan, Shape and Texture based fish image recognition Cichlid Fishes from Lake Malawi using Computer Vision, 2013
system, Kasetsart J. (Nat. Sci.) 47 : 624 - 634 (2013) Oct 25;8(10):e77686. doi: 10.1371/journal.pone.0077686
[12] Mehrtash T.Harandi, Conrad Sanderson, Sareh Shirazi, etc. [19] Y. H Shiau, F-P. Lin, C-C Chen, Using Sparse Representation
Graph Embedding Discriminant Analysis on Grassmannian for Fish Recognition and Verification in Real World
Manifolds for Improved Image Set Matching. Observation, ICONIP;12 Proceedings of the 19th
CVPR2011.2705-2712,2011 International Conference on Neural Information Processing-
[13] Rodrigues, M.T.A., Freitas, M.H.G., Pádua, F.L.C. et al. Volume Part IV, pages 75-82
Pattern Anal Applic (2015) 18: 783. doi:10.1007/s10044-013- [20] S.O. gunlana, O. Olabode, S.A.A. Oluwadare, G.B. Iwasokun,
0362-6 Fish Classification Using Support Vector Machine, African
[14] S. Sclaroff, L. Liu, Deformable Shape Detection and Journal of Computing & ICT, Vol 8. No. 2 June, 2015
Description via Model-Based Region Grouping, IEEE [21] S. Cadieux, F. Lalonde, and F. Michaud, “Intelligent System
TRANSACTIONS ON PATTERN ANALYSIS AND for Automated Fish Sorting and Counting,” IEEE IROS, pp.
MACHINE INTELLIGENCE, VOL. 23, NO. 5, MAY 2001 1279–1284, 2000.
[15] Spampinato, C, Chen-Burger, Y-H, Nadarajan, G & Fisher, R [22] D. J. Lee, S. Redd, R. Schoenberger, X. Xiaoqian, and Z.
2008, 'Detecting, Tracking and Counting Fish in Low Quality Pengcheng, “An Automated Fish Species Classification and
Unconstrained Underwater Videos'. inProc. 3rd Int. Conf. on Migration Monitoring System,” in Conf. of the IEEE Industrial
Computer Vision Theory and Applications (VISAPP). vol. 2, Electronics Society, 2003, pp. 1080–1085.
pp. 514-519. [23] M. Nery, A. Machado, M. Campos, F. Padua, R. Carceroni, and J.
[16] Hernández-Serna, Andrés, and Luz Fernanda Jiménez-Segura. Queiroz-Neto, “Determining the Appropriate Feature Set for
“Automatic Identification of Species with Neural Networks.” Effective Fish Classification Tasks,” in SIBGRAPI, 2005, pp.
Ed. Mark Costello. PeerJ 2 (2014): e563. PMC. Web. 30 Sept. 173–180.
2016. [24] B. J. Boom, P. X. Huang, J. He, R. B. Fisher, "Supporting
[17] AK Joginipelly, D Charalampidis, G Ioup, J Ioup, Ch Ground-Truth annotation of image datasets using clustering",
Thompson, Species-Specific Fish Feature Extraction Using 21st Int. Conf. on Pattern Recognition (ICPR), 2012.
Gabor Filters [25] T. Wang and P. Shi. Kernel grassmannian distances and
discriminant analysis for face recognition from image sets.
Pattern Recognition Letters, 30(13):1161–1165, 2009