Machine learning applied to big data from marine cabled observatories: A case study of sablefish monitoring in the NE Pacific

Bonofiglio, Federico; De Leo, Fabio C.; Yee, Connor; Chatzievangelou, Damianos; Aguzzi, Jacopo; Marini, Simone

doi:10.3389/fmars.2022.842946

ORIGINAL RESEARCH article

Front. Mar. Sci., 26 August 2022
Sec. Ocean Observation
Volume 9 - 2022 | https://doi.org/10.3389/fmars.2022.842946

Machine learning applied to big data from marine cabled observatories: A case study of sablefish monitoring in the NE Pacific

Federico Bonofiglio¹

Fabio C. De Leo^2,3

Connor Yee³

Damianos Chatzievangelou⁴

Jacopo Aguzzi^4,5

Simone Marini^1,5*

¹Institute of Marine Sciences (ISMAR), National Research Council of Italy, Lerici, Italy
²Ocean Networks Canada, University of Victoria, Victoria, BC, Canada
³Department of Biology, University of Victoria, Victoria, BC, Canada
⁴Functioning and Vulnerability of Marine Ecosystems Group, Department of Renewable Marine Resources, Instituto de Ciencias del Mar - Consejo Superior de Investigaciones Científicas (ICM-CSIC), Barcelona, Spain
⁵Stazione Zoologica Anton Dohrn (SZN), Naples, Italy

Ocean observatories collect large volumes of video data, with some data archives now spanning well over a few decades, and bringing the challenges of analytical capacity beyond conventional processing tools. The analysis of such vast and complex datasets can only be achieved with appropriate machine learning and Artificial Intelligence (AI) tools. The implementation of AI monitoring programs for animal tracking and classification becomes necessary in the particular case of deep-sea cabled observatories, as those operated by Ocean Networks Canada (ONC), where Petabytes of data are now collected each and every year since their installation. Here, we present a machine-learning and computer vision automated pipeline to detect and count sablefish (Anoplopoma fimbria), a key commercially exploited species in the N-NE Pacific. We used 651 hours of video footage obtained from three long-term monitoring sites in the NEPTUNE cabled observatory, in Barkley Canyon, on the nearby slope, and at depths ranging from 420 to 985 m. Our proposed AI sablefish detection and classification pipeline was tested and validated for an initial 4.5 month period (Sep 18 2019-Jan 2 2020), and was a first step towards validation for future processing of the now decade-long video archives from Barkley Canyon. For the validation period, we trained a YOLO neural network on 2917 manually annotated frames containing sablefish images to obtain an automatic detector with a 92% Average Precision (AP) on 730 test images, and a 5-fold cross-validation AP of 93% (± 3.7%). We then ran the detector on all video material (i.e., 651 hours from a 4.5 month period), to automatically detect and annotate sablefish. We finally applied a tracking algorithm on detection results, to approximate counts of individual fishes moving on scene and obtain a time series of proxy sablefish abundance. Those proxy abundance estimates are among the first to be made using such a large volume of video data from deep-sea settings. We discuss our AI results for application on a decade-long video monitoring program, and particularly with potential for complementing fisheries management practices of a commercially important species.

Introduction

Ocean exploitation at industrial levels increasingly threatened the ocean health (Danovaro et al., 2017; Rayner et al., 2019; Kildow, 2022), especially regarding fisheries (Kearney and Hilborn, 2022; Pentz and Klenk, 2022; Pitcher et al., 2022) and underwater mining activities (Levin et al., 2020; Filho et al., 2021). Fisheries in particular, although not a global issue per se (rather more efficiently treated on a local to regional level; Kearney and Hilborn, 2022), can contribute to a global reduction of biomass if not correctly managed (Palomares et al., 2020). Efficient management of the marine environment is promoted through international initiatives such as the UN’s Decade of Ocean Science for Sustainable Development¹ and the EU’s Marine Strategy Framework Directive (MSFD, 2008), where intelligent monitoring of marine resources is expected to play a central role (Beyan and Browman, 2020; Malde et al., 2020; Rountree et al., 2020; Aguzzi et al., 2021).

Estimates of species composition and relative abundance of a local community are influenced by the temporal patterns in the behavioral rhythms of fauna at diel (i.e., 24-h) and seasonal scales (Aguzzi and Company, 2010; Aguzzi et al., 2011b; Aguzzi et al., 2015b). This raises the need for high-frequency, long-term, datasets of the stock levels and demographic indices of fishery targets, whose generation is difficult with the use of expensive vessel-based sampling methodologies (Naylor, 2010). Moreover, in zones such as Marine Protected Areas (MPAs), invasive sampling of key species is often prohibited (Vigo et al., 2021).

Bottom trawling is to date the most reliable stock assessment method for demersal resources (Ovando et al., 2022) but it’s temporally intensive application would have a high environmental impact (e.g., Hiddink et al., 2006; Jamieson et al., 2013; Flannery and Przeslawski, 2015; Colloca et al., 2017; Costello et al., 2017; Sciberras et al., 2018; Rousseau et al., 2019; De Mendonça and Metaxas, 2021). Imagery-based sampling methods are still not as widespread (Bicknell et al., 2016), although they are successfully applied for gathering species composition and abundance data, for instance along usage of Baited Remote Underwater Video Stations (BRUVs), stereo-BRUVs (with stereo cameras for accurate fish sizing), and more recently Deep-BRUVS (for deep-water deployments; e.g., Langlois et al., 2018; Withmarsh et al., 2017). Nowadays, such fixed-point cabled observatory stations can collect detailed biological and environmental data, allowing for species abundance estimates in response to environmental changes (Aguzzi and Company, 2010; Aguzzi et al., 2012; Aguzzi et al., 2015b; Chatzievangelou et al., 2016; Doya et al., 2017; Chatzievangelou et al., 2020).

In order to fully unlock the potential of such infrastructures, appropriate analytical tools are needed to automatically and quickly process vast amounts of generated data (Schoening et al., 2012; Osterloff et al., 2016; Aguzzi et al., 2019; Osterloff et al., 2020; Zuazo et al., 2020). Artificial Intelligence (AI) procedures can provide important fishery-independent data (Marini et al., 2018a; Marini et al., 2018b; Malde et al., 2020; Lopez-Vazquez et al., 2020; Aguzzi et al., 2021; Harris et al., 2021) collected by infrastructures capable of supporting video data acquisition and processing (Jahanbakht et al., 2021).

Sablefish (Anoplopoma fimbria) is a demersal fish species of the Pacific coast of North America (depth range 300-3000 m; Orlov, 2003), which supports important commercial fisheries (Warpinski et al., 2016; Riera et al., 2020). Sablefish populations include resident and migrating individuals performing both horizontal and vertical movements (Jacobson et al., 2001; Maloney and Sigler, 2008; Morita et al., 2012; Hanselman et al., 2015; Goetz et al., 2018; Sigler and Echave, 2019) across large geographic ranges (Chapman et al., 2012). In British Columbia (BC), sablefish stocks have shown indication of decline, with a recent observed resurgence attributed to some years of stronger recruitment (Workman et al., 2019). Along the BC coast, sablefish appear to move horizontally, back and forth along the canyon axis on a daily basis, in search of prey (Doya et al., 2014; Chatzievangelou et al., 2016). Locally, seasonal trends in abundance have been also documented by the high-frequency observations provided by the North-East Pacific Undersea Networked Experiments (NEPTUNE) observatory’s seafloor video cameras (Doya et al., 2017; Chauvet et al., 2018).

There are available tools to automatically process large quantities of videos for extracting biological data. A number of methodologies have been proposed for fish species recognition and classification over the last two decades, but the great variability of either species morphologies, or the conditions in which the videos are captured, is still a major challenge for automated processing (e.g., Matabos et al., 2017; Marini et al., 2018a; Marini et al., 2018b; Ottaviani et al., 2022). These automated approaches span a wide range of topics within the AI and computer vision based literature (e.g., Hsiao et al., 2014; Nishida et al., 2014; Wong et al., 2015; Chuang et al., 2016; Tills et al., 2018; Harrison et al., 2021; Yang et al., 2021; Liu et al., 2021; Sokolova et al., 2021a; Sokolova et al., 2021b). A preliminary attempt to automatically detect sablefish in Barkley Canyon was carried out by Fier et al. (2014) using a supervised image segmentation approach, whose detection efficiency was later compared with manual annotations performed by biologist experts, students and the general public using an online citizen science platform (DigitalFishers, http://dmas.uvic.ca/DigitalFishers), (Matabos et al., 2017). However, those initial attempts highlighted that the fish detection algorithms were still rudimentary, being significantly outperformed by trained human eyes, and therefore calling for significant improvement. Nevertheless, the recent developments based on Deep Learning (DL) Convolutional Neural Network (CNN) processing methods demonstrated high accuracy performance and reliability in completing fish recognition and classification tasks (Konovalov et al., 2019; Lopez-Vazquez et al., 2020; Yang et al., 2021; Lopez-Marcano et al., 2021; Zhao et al., 2021; Ottaviani et al., 2022).

DL has recently emerged as an innovative field in AI for language processing, computer vision, and the alike (Goodfellow et al., 2016; Han et al., 2020; Langenkämper et al., 2020; Malde et al., 2020; Ottaviani et al., 2022). CNNs are successfully used in computer vision because, thanks to their interconnected structure, they can automatically capture many hidden features of the input data, generating information on a number of valuable image features (shape, texture, and so on) with little manual intervention (Simonyan and Zisserman, 2015; LeCun et al., 2015; Girshick, 2015; Ren et al., 2015). In general terms, a CNN is a complex statistical model of some outcome (here images) which can be graphically represented as interconnected nodes, stacked into several layers. Each node is assigned one or more parameters to be optimized through a training phase. Since the whole network accounts for many nodes and often millions of parameters, DL methods can be computationally intensive (LeCun et al., 2015). Although very useful, such techniques alone are yet not sufficient to fulfill all scopes of automated monitoring programs centered on megafauna quantification, and still require some customization effort.

In the present study, we combine existing DL approaches with object tracking techniques, in order to produce reliable counts of sablefish in deep-sea settings of slope and submarine canyon habitats of the NE Pacific within variable environmental conditions and subjects in movement. Making use of the NEPTUNE cabled observatory, an abundance time-series extracted from video data was generated through an AI-based pipeline proposed for the automatic detection, classification, and counting of sablefish. In general, the proposed approach is aimed at supporting the stock-related assessment metrics and monitoring programs of deep sea commercial species with ancillary data, according to the increasing trend of networks of fixed point observatories (Painting et al., 2020; Rountree et al., 2020; Rountree et al., 2020; Aguzzi et al., 2021).

2. Methods

2.1 Cabled observatory data

The NEPTUNE cabled observatory operated by Ocean Networks Canada (ONC) presently represents one of the best technologically-equipped networks to undertake fish communities monitoring along the Pacific coast of North America (Aguzzi et al., 2020a). One of its nodes, located in Barkley Canyon², consists of several cabled instrumented platforms that span a maximum linear distance of ~15 km, and a depth range of 400 to 985 m, overlapping with the habitat range of the greatest sablefish abundance (Goetz et al., 2018; Kimura et al., 2018; Aguzzi et al., 2020a). A total of 5 fixed instrumented platforms and a mobile crawler (with a 70-m radius range) are equipped with a suite of oceanographic and biogeochemical sensors in addition to video cameras mounted on pan and tilt units (Figure 1).

FIGURE 1

Figure 1 Map (A), schematic (B) and photographs depicting the study area in Barkley Canyon, NE Pacific, and the NEPTUNE cabled observatory infrastructure. Video imagery used in this study was limited to 3 out of 5 long-term monitoring sites, Upper Slope (420 m), Canyon Axis (985 m) and Node (620 m). Screen grabs (labeled Upper Slope, Node and Axis) represent the variable video field of view conditions of the three locations used in our study.

The study took place at three sites from the Barkley Canyon, all equipped with a video-camera (Table 1 and Figure 1A). All the high-definition cameras recorded at 1080p horizontal line pixel resolution and at ~23 frames per second (fps). The cameras were mounted on galvanized steel tripods and attached to ROS-485 Pan and Tilt units, with a pair of dimmable ROS LED lights (100 W, > 406 lm), which provided illumination during video recording. Two parallel laser-beams, 10 cm apart, provided scaling of the seafloor. All three cameras had specific and optimal field of views adjusted after deployment, which remained unaltered throughout the study period (Table 1). Due to variable tilt angles and tripods, each with slightly different heights above the seabed, the imaged seabed area varied among the cameras, ranging 5-10 m² (Table 1). The three cameras synchronously recorded 5-minute videos at each UTC hour. The synchronization in image acquisition was of particular relevance to track fish shoals and spatial displacements of benthic megafauna within a spatially-coherent camera network along the continental margin areas (Aguzzi et al., 2011a; Aguzzi et al., 2020b).

TABLE 1

Table 1 Camera specifications and deployment details according to the locations shown in Figure 1.

2.2 Data analysis work-flow, preparation of the training set

Archived video data were accessed using the ONC repository³ and Application Programming Interface (API) using a Python client library⁴ targeting the study period, corresponding to September 18th 2019 to February 3rd 2020 (136 days). Next, in order to train a classification model for recognizing sablefish individuals in the videos, we generated a ground-truth dataset consisting of a number of video frames selected from the whole video material and containing examples of sablefish individuals. Figure 2 schematically describes the workflow required for the data analysis up to the endpoint production (estimated fish count time series).

FIGURE 2

Figure 2 Schematic workflow of the video imagery data analysis. Frames from the acquired video are selected, manually annotated and used for the training/validation of the sablefish classifier; Then the trained classifier is applied to the footage not used in the training/validation phase producing the bounding boxes of the detected relevant subjects; then the tracking algorithm is used to track the relevant subject moving within the field of view with the aim of producing reliable counts of the framed individuals.

The ground-truth data set was prepared by two annotators on two different groups of videos. One annotator manually searched and selected video material which resulted in annotation of 3189 video frames, mostly from the Barkley Node location. Another annotator additionally annotated 458 video frames, resulting from a simple random selection of a pool of frames evenly distributed along all study sites. The total ground-truth dataset corresponds to 3647 frames subdivided between training and validation as described in Table 1. Moreover a 5-Fold Cross-Validation within the training/validation part of the dataset has been used for the evaluation of the classification performance, as described in the Result section.

Annotations were made by drawing a bounding box around each sablefish individual in the image, and by specifying the class (i.e., species) of that individual (i.e., only one in the present application). Hence, the ground-truth data ready for computer vision training consisted of a set of paired images and text files containing bounding box coordinates and class labels for each individual. All annotations were made using the labeling software labellmg⁵.

2.3 Classification model

In recognizing the rapidly evolving landscape in DL computer vision methods, and in critically evaluating the most recent approaches, we focused on models readily available to the average user with relatively limited computation resources. The “You Only Look Once” (YOLO) DL neural network (Redmon et al., 2016) was ultimately selected, after recently drawing attention for its good performances and relatively light computational burden (Aziz et al., 2020). Relative to other successful CNNs, YOLO does manage to process the data in one step, by introducing yet more automation in the detection and classification process (Koirala et al., 2019),. Although many versions of the YOLO architecture exist (e.g. YOLOv3⁶ and YOLOv7⁷), experiments have been performed with the YOLOv5 software⁸.

2.4 Transfer learning

The classification task involves the recognition of a relevant subject with respect to the image background. To this end, the required, expensive, training step can be simplified by using a transfer learning approach, where a classifier is trained on a general purpose dataset and then specialized on a dataset defined for the specific application context (Tan et al., 2018). Accordingly, we pre-trained the network on the Common Objects in COntext dataset (COCO⁹), which is a popular general purpose dataset containing annotated images of animals (without fishes), persons and various objects. The pre-train phase was aimed at lending the network a baseline discrimination for a variety of image features (e.g., shapes, textures), as opposed to training the network on the sablefish dataset from scratch which is time consuming and needs a huge amount of ground-truth examples.

Only in a second step, we used the dataset obtained from the manual annotations for specializing the pre-trained classifier on sablefish-related features the acquisition conditions varying across the ONC sites (i.e, seafloor illumination, texture, color, etc.) and the different species present in the images e.g. Pink Urchin (Strongylocentrotus fragilis), Tanner crab (Chionoecetes tanneri), hagfish and gastropods species). This additional training resulted in a classifier specifically calibrated for sablefish detection. The classifier was fine-tuned via further training on an extended dataset including outputs by both manual annotators, which increased representativeness of the overall study area conditions.

For the performance assessment, we measured the Average Precision (AP) at 50% overlap between the ground-truth and the inferred bounding boxes. AP is the area under the curve resulting from plotting the classification Recall and Precision for sablefish (Fawcett, 2006). Recall is the number of detected True Positives (TP) over all ground truth annotations,

Recall = \frac{T P}{(T P + F N),}

where FN is all False Negatives. Recall is also known as sensitivity. Precision is all TP over all detections:

Precision = \frac{T P}{(T P + F P),}

where FP is all False Positives. Precision is also known as a positive predictive value.

We make the weights obtained by this training procedure and used for detection and classification analyses also freely available for reproducibility of our results (see next Section below). Training was performed on Google Colab¹⁰.

2.5 Automatic detection and classification analysis

In this final step, we applied the trained YOLO model on the stored video data, to produce automatic detection and classification of all sablefish appearing on scene during the 4.5 month period and at the three study locations. The output for one single video clip was an automatically produced annotation text file, for each video frame, including bounding box coordinates, object class, and detection confidence for each detected object in the frame (ranging 0-1). To speed up the video footage analysis, without losing relevant information, we kept only 50% of the original frame size and only every 20th frame in the four minutes video clip.

While the YOLO model is necessary to detect and classify every sablefish appearing in each video frame, this alone cannot provide estimates of the number of unique individuals appearing in the video clip. Therefore, we needed to extend the computer vision pipeline to add a tracking step, aimed at producing an estimate of the number of individual subjects appearing in the video. Such a step reduces the multiple counts of individuals moving along the field of view. (. To this end, we used the bounding box coordinates automatically generated by the YOLO for tracking each individual fish along the time sequence of video frames.

Although many approaches exist for object tracking (Fiaz et al., 2019; Du and Wang, 2022) we decided to implement a simple tracking algorithm based on the Euclidean distance among the bounding boxes of the individuals detected by the classification algorithm (e.g. Islam, 2020; Azimjonov and Özmen, 2022). We related two fishes between two consecutive time frames by taking the corresponding box coordinates (centroids) and by computing their Euclidean distance. We repeated this for any pair of centroids, and the pair with the smallest Euclidean distance resulted in attributing one track to the same individual moving in the field of view. Therefore, the estimated total number of tracks in one video is an approximation of proxy abundance recorded by the video. One fish was tracked until it disappeared from the scene for a maximum number (“m”) of consecutive empty frames, which was a key parameter of this algorithm. This method might cause fishes to swap tracks if centroids of two different fishes come too close and cross. However, this alone does not critically alter the estimation of the total number of individuals in one video, and therefore is an acceptable drawback of this tracking method.

2.6 Machine learning output assessment against manual visual counts

We plotted the sablefish hourly count time series calculated from the AI pipeline output over the entire 4.5 month monitoring period at each observatory platform/site. The sablefish count estimates should be considered as a proxy of fish abundance measurements, because observations arise from a non-probability sampling scheme and because of intrinsic approximation errors of the detection pipeline. We computed summary statistics such as mean sablefish count over the study period and linear time trend.

We validated the machine-learning results against independent visual counts manually obtained via the widespread MaxN estimation method (Priede et al., 1994; Harvey et al., 2007; Yeh and Drazen, 2009; Schobernd et al., 2014) on 30-second video clips extracted from the original 5-min video files, and starting at elapsed 1 min 45 sec into each video to ensure standardized effects of artificial lighting on the sablefish startling behavior and attraction in the field of view (Doya et al., 2014; De Leo et al., 2018). MaxN consists of obtaining the maximum number of fish (peak abundance) observed in any single frame during the viewing interval (here 30 sec.). It is also a proxy for abundance, because it typically underestimates its true value. Therefore, we are only interested in comparing the trend between the AI-based and the manual counts time-series, as opposed to their absolute value. For Barkley Axis and Upper Slope, only 30 days worth of video clips were used for the manual count. We computed the Pearson correlation between the machine-learning and the manual time-series results, by matching them by date.

All the scripts needed for downloading and processing the ONC data are freely available on GitHub (Bonofilgio, 2021).

3. Results

3.1 Training and validation

The volume of data analyzed consisted of 9772 video clips (Node = 3313; Axis = 3154; Slope = 3305) over the 4.5 month period, for a total of ~650 hours of video recording and ~1 TB of data. Among all the available video-clips 3647 were used for training/validation and test and the remaining 6125 video-clips were used for comparison between automated and manual counts. At the end of the detection pipeline, we had a sablefish count time series of ~600 KB, which stresses the stark data reduction needed to obtain the desired end information. The manual annotation produced a total of 9205 tagged unique sablefish individuals distributed in 3647 image-frames, used for training (n = 2917) and testing (n = 730) the YOLO (Table 1).

The obtained classifier resulted in a 92% AP on test data and a 93% (sd: 3.7%) 5-fold cross-validated AP (mean precision: 99%, sd: 0.4%; mean recall: 52%, sd: 1.2%)¹¹. We deemed these results as good and kept this model as our final detector and classifier.

3.2 Detection and tracking results

Reducing video size and frames number prior to automatic detection, resulted in decreasing data size by almost 14 times, which also sped up detection computations. In particular, only one frame every 20 was analyzed, reducing the processing time of a whole video clip from about 20 minutes to 3 minutes, on an 8 CPU Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz computer.

Although the problem of uncertainty and confidence within the object classification context is quite complex and still debated (Mena et al., 2022), we run the tracking algorithm on all detection results, filtering only detections with the confidence value returned by YOLO greater or equal than 80% for the Node and Axis sites and 95% for Slope. The larger confidence value used for the Slope was needed to better discriminate the very few individuals of sablefish occurring in this site against other fauna (e.g., dark blotched rockfish), present in much greater abundance here compared to the other two sites. In total, this processing step produced a count time series of about 600 KB of size only. Examples of sablefish detections from two consecutive frames of video clips of Barkley Axis and Node are shown in Figure 3. The labeled boxes are the result of the automatic YOLO detection and classification. Note the classification confidence metric displayed in each sablefish bounding box (ranging from 0-1), which we use as a filter to generate more conservative estimates when needed. Green numbered points over each individual are the result of the tracking algorithm. Using the examples in Figure 3, the procedure counted 105 and 32 sablefish individuals at the end of each respective video clip for Axis and Node. These two entire video clips are available as Supplementary Material (Supplementary Videos 1 and 2). For Barkley Axis, where the video camera had multiple pan and tilt field of view positions, the YOLO and the tracking algorithm had similar efficiency under variable artificial illumination conditions as well as with variable amounts of seafloor present in the image background (Figure 4).

FIGURE 3

Figure 3 Examples of the outcome of detection, classification, and tracking of sablefish individuals. Bounding boxes and the corresponding confidence levels are the result of the YOLO detection and classification algorithm, where the classification confidence ranges from 0 to 1. Green numbered points (barely visible in the images) are the result of the tracking algorithm, approximating the count of unique individuals appearing on video. The images at the top show two video frames at 5 and 7 seconds elapsed from a video clip recorded in Barkley Axis on October 17th 2019 at 11:00 am UTC. The images at the bottom show two video frames at 31 seconds and 2 minutes elapsed from a video clip recorded in Barkley Node on October 21st 2019 at 8:04 am UTC.

FIGURE 4

Figure 4 Examples of sablefish detection and classification at Barkley Axis (NE Pacific), showing multiple pan positions (camera moving laterally) and multiple tilt positions (camera moving vertically). The algorithm had to work under multiple illumination and seafloor configurations.

3.3 Derived time-series of sablefish proxy abundance

Using the tracking algorithm, estimates of sablefish proxy abundance from each video clip were extracted, resulting in a 4.5 month long proxy abundance time-series (Figure 5). A total of 263,780 sablefish individuals were counted across all sites (Node: 107,366; Axis: 156;414; Slope: 3), with mean counts of 32.4 (sd: 24.4) and 49.6 (sd: 26.3) per videoclip for Node and Axis, respectively.

FIGURE 5

Figure 5 Automatically detected count time series at Node (A) and Axis (B) sites. Data from Upper Slope is not shown since there were only 4 instances where sablefish was detected.

The time series at the Node site showed fluctuating behavior (Figure 5A) and an overall significant linear increase of 12 ± 0.25 sablefish per month on average (R² = 0.4), especially due to the rise in counts starting from January 13th. The time series at the Axis site had a concave shape (Figure 5B) with an initial sharp increase and a slow decrease at the end. Therefore, the time series had an almost null linear change in average number of sablefish per month with little explained variation (slope = -1.3^-6 ± 1.4^-7, R² = 0.027), and therefore a poor linear fit. Filter increment at Upper Slope site resulted in counting only three sablefish between November 17th-27th 2019. We checked videos at these dates confirming the presence of sablefish individuals, although the automatic count was slightly more conservative. We also directly checked for counts greater than 100 at Barkley Node site after day 120, manually confirming the numbers.

The results obtained from the automatic sablefish counts showed good agreement with results from the manual counts, both for Barkley Node (Figure 6) and Axis (Figure 7). However, it is worth noting that the absolute sablefish counts cannot be compared between the two methodologies, in particular because the manual counts were obtained from much shorter video clips (30 sec) instead of the 5 min. long videos used in the YOLO and tracking algorithms combined. Thus, we are mostly interested in comparing the temporal trends between the two series (see Figures 6, 7) and their correlation. For site Node and Axis, we measured 88% and 81% Pearson correlation, respectively (p-value< 0.0001 in both cases). Manual counting at Upper Slope yielded 4 sablefish between September 19th and October 21st, 2019. Therefore, both automatic and manual counting at Upper Slope resulted in False Negatives. Finally, the result of the time trend was robust to variation of the main parameter “m” of the tracking algorithm (Figure 8).

FIGURE 6

Figure 6 Trend validation of automatically detected counts (A) at Node site compared to manually detected counts (B).

FIGURE 7

Figure 7 Trend validation of automatically detected counts (A) at Axis site compared to manually detected counts (B), for the period September-October 2019 only.

FIGURE 8

Figure 8 Sensitivity analysis of automatic count calculations, by varying the parameter “m” (maximum number of consecutive empty frames to declare a track lost) of the tracking algorithm. Increasing the value of “m” leads to more conservative results. Sites: Node (A) and Axis (B).

4. Discussion

The results obtained by our machine learning (deep-learning combined with tracking algorithm) approach successfully generated a pipeline of automated detection, classification, and proxy abundance estimation of sablefish. The good agreement in temporal trends of the automated and manually annotated sablefish counts demonstrates the value of this machine learning approach for measuring the local sablefish population dynamics in Barkley Canyon and adjacent Upper Slope. In particular, the machine learning pipeline was successful in detecting sablefish under highly variable environmental conditions; i.e., a depth gradient (420-970 m) which generally translates into variable regimes of seafloor detritus input and suspended particulate matter (sediments and organic matter). Those variable conditions, in turn, have a direct effect on the overall quality of the video imagery (seafloor color, illumination field, etc.). Furthermore, the automated identification of only a few individuals of sablefish at the shallower site (Upper Slope), which overall agrees with the manual identification of sablefish individuals, provides us with valuable information about the upper habitat boundaries for this deep dwelling species.

4.1 Methodological remarks

While the synergy between the video camera infrastructure and machine learning has the big advantage of harnessing vast amounts of monitoring data (Boom et al., 2014; Malde et al., 2020; Beyan and Browman, 2020), we shall nevertheless recall that the resulting animal count estimates are only a proxy for total abundance due to both intrinsic and contingent limitations. Intrinsic limitations deal with the difficulty of any automated detection method to perform a capture-recapture approach; i.e., to discern a specific individual across consecutive video clips, or when a single individual exits and reenters the framed scene (Francescangeli et al., 2022).

In any case, this problem is common with this type of video-monitoring studies and data, that might be processed with the MaxN method for manual counting (Martinez et al., 2011; Linley et al., 2017). Here, the contingent limit was due to the camera-based recording, which did not follow any probability scheme and is therefore open to possible bias. Other possible sources of selection bias include disturbance effects of the newly positioned infrastructures and their artificial lights (Rountree et al., 2020).

Although the abundance of specimens cannot be exactly estimated, the proposed methodology provides consistent temporal dynamics at the three sites that can be used for further multivariate analysis involving physical and bio-chemical variables aimed at explaining and comparing the different dynamics in the three sites. Moreover, the raw counts provided by the methodology can be further used in a variety of ways. For instance, for applying MaxN or other estimators to the automatic counts obtained.

The classifier’s 5-fold cross-validation AP of 93% is mainly driven by very high precision values. In other words, the classifier shows good predictive performance when confronted with new data characterized by changing environmental conditions, species make-up, and animal movement dynamics. On the contrary, the observed lower values of recall (52%), or the higher misclassification of ground-truth annotations, is acceptable and compatible with a low degree of model over-fitting (and high generalization capability), which is desirable in our real-world conditions.

Nevertheless, the precision of the classifier did vary from site to site and in particular we observed a drop in precision at the Upper Slope site. This phenomenon required a higher confidence threshold during tracking in order to eliminate the wrong false positive detections possibly caused by the underrepresentation of examples in the training set for the Upper Slope site.

4.2 Practical applications

The developed AI pipeline helps the transformation of a cabled observatory’s camera into a biological sensor for the automated delivery of numerical information about the observed ecosystem (Aguzzi and Company, 2010; Aguzzi et al., 2011; Rountree et al., 2020). Such “intelligent” observatories become appealing to process the rising amount of imaging data for the production of fishery-independent metrics supporting management of economically-relevant stock resources (Cappo et al., 2004; Bicknell et al., 2016; Langlois et al., 2018; Aguzzi et al., 2021). Moreover, the use of the transfer learning approach combined with the tracking algorithm makes the proposed pipeline usable in different application contexts where footage needs to be analyzed and complementing invasive methods such as trawling, and it is particularly suited for megafauna monitoring in combination with optoacoustic or molecular life-tracing technologies (Aguzzi et al., 2019; Levin et al., 2019; Danovaro et al., 2020).

Our work presents a further step towards the integration of automated tracking and classification of animals with AI routines embedded in robotic platforms (Aguzzi et al., 2022). The automation achieved here allows cameras to deliver time-series of data for a species of commercial interest, in agreement with the socioeconomic need for growing permanent robotic networks worldwide (Danovaro et al., 2017; Aguzzi et al., 2019 Science). Imaging approaches are currently the chief technological application in ecological monitoring (Durden et al., 2016), but the transformation of cameras into true sensors is facing difficulties due to the lack of automated routines to extract relevant ecological information on species and their abundances (Durden et al., 2016; Bicknell et al., 2016; Aguzzi et al., 2019). Bandwidth is another limiting factor for the transmission of images from remote autonomous platforms to land stations, therefore the next steps would include embedding the processing steps in the platforms in order to reduce the output to numeric data which is easier to transfer. Fishery-independent stock assessment programs (e.g., the Norway lobster as an iconic fishery resource for the EU monitored by Under-Water Tele-Vision surveys; Aguzzi et al., 2021) can benefit from such advancements. Moreover, as the envisaged development of industrial-level activities such as deep-sea mining raises the need for effective and efficient assessment of their environmental impact, data achieved with image-based monitoring by networks of fixed and mobile platforms (Rountree et al., 2020 OMBAR; Weaver et al., 2022) can provide fast and flexible solutions. Finally, the same methodologies could be applied to other strategic sectors of marine industry such as aquaculture monitoring (e.g., Muñoz-Benavent et al., 2018), as well as offshore hydrocarbon and energy-generating platforms (e.g., Gates et al., 2019; McLean et al., 2020).

5. Conclusions

This pilot study showed that the methodology has overall benefits in automatically processing big data, which is more often generated in marine sciences applications. Automatically generated proxy abundances of the sablefish subpopulation at Barkley Canyon, are interpretable and prove to be reasonable, while validated with manually generated counts. Therefore, we believe the full potential of the approach can be next used to process yet larger amounts of archived video data and produce a 10 years time series of proxy abundance, which could support the comprehension of long-term stock change, especially in light of rapidly changing climatic and environmental conditions.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material. We made all training computations on the Google Colab notebook¹². To repeat the training, please clone the Google Drive repository containing the annotated data at¹³. All detection, tracking, and time-series analyses are freely available for reproduction at¹⁴.

Ethics statement

Ethical review and approval was not required for the animal study because this was an observational study using video data recording animals in their wild habitat.

Author contributions

SM, FC, and JA conceptualized the project. FC pre-processed the video footage to be analyzed and provided the expertise of the ONC infrastructure; SM and FB conceived the automated image analysis approach based on machine learning. FB and CY manually annotated a subset of the video data; FB implemented the algorithms for the video data analysis; FC, DC, and JA provided their biological expertise on the data tagging and the result interpretation and validation. All authors contributed to the article writing and revision and approved the submitted version.

Funding

Ocean Networks Canada is funded through Canada Foundation for Innovation-Major Science Initiative (CFI-MSI) fund 30199. Ideas for this paper resulted from discussions during the international workshop “Marine cabled observatories: moving towards applied monitoring for fisheries management, ecosystem function and biodiversity”, funded by Ocean Networks Canada and co-hosted by ICM-CSIC, in Barcelona, Spain on 4–5 October 2018.

Acknowledgments

This work was developed within the framework of the Research Unit Tecnoterra (ICM-CSIC/UPC) and the following project activities: ARIM (Autonomous Robotic sea-floor Infrastructure for benthopelagic Monitoring; MarTERA ERA-Net Cofound); RESBIO (TEC2017-87861-R; Ministerio de Ciencia, Innovación y Universidades); PLOME (PLEC2021-007525/AEI/10.13039/501100011033; Ministerio de Ciencia, Innovación y Universidades); JERICO-S3: (Horizon 2020; Grant Agreement no. 871153); ENDURUNS (Research Grant Agreement H2020-MG-2018-2019-2020 n.824348). We also profited from the funding of the Spanish Government through the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S). We are also thankful for the support from ONC’s marine and digital operations staff for servicing and maintaining the NEPTUNE observatory and for the curation and quality control of all video imagery data used in this study.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.2022.842946/full#supplementary-material

Footnotes

^ https://en.unesco.org/ocean-decade
^ https://www.oceannetworks.ca/multimedia/maps/
^ https://data.oceannetworks.ca/,
^ https://wiki.oceannetworks.ca/display/O2A/Oceans+3.0+API+Home
^ https://github.com/tzutalin/labelImg
^ https://github.com/ultralytics/yolov3
^ https://github.com/jinfagang/yolov7
^ https://github.com/ultralytics/yolov5
^ https://cocodataset.org
^ https://colab.research.google.com/
^ Section 5-7 of the online Colab notebook, See data availability statement below.
^ https://drive.google.com/drive/folders/1OnY2MlhUcQjefCfsU8cgxl_aJzFDr6GA?usp=sharing
^ https://drive.google.com/drive/folders/10JtFCofWq16-foQFiOlTCQk4bXk2Vom7?usp=sharing
^ https://github.com/bonorico/analysis-of-ONC-video-data

References

Aguzzi J., Bahamon N., Doyle J., Lordan C., Tuck I. D., Chiarini M., et al. (2021). Burrow emergence rhythms of nephrops norvegicus by UWTV and surveying biases. Sci. Rep. 11 (1), 5797. doi: 10.1038/s41598-021-85240-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Aguzzi J., Chatzievangelou D., Company J. B., Thomsen L., Marini S., Bonofiglio F., et al. (2020a). Fish-stock assessment using video imagery from worldwide cabled observatory networks. ICES. J. Mar. Sci. 77, 2396–2410. doi: 10.1093/icesjms/fsaa169

ORIGINAL RESEARCH article

Machine learning applied to big data from marine cabled observatories: A case study of sablefish monitoring in the NE Pacific

Introduction

2. Methods

2.1 Cabled observatory data

2.2 Data analysis work-flow, preparation of the training set

2.3 Classification model

2.4 Transfer learning

2.5 Automatic detection and classification analysis

2.6 Machine learning output assessment against manual visual counts

3. Results

3.1 Training and validation

3.2 Detection and tracking results

3.3 Derived time-series of sablefish proxy abundance

4. Discussion

4.1 Methodological remarks

4.2 Practical applications

5. Conclusions

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

Supplementary material

Footnotes

References

People also looked at