| dc.description.abstract |
he recent widespread adoption of drones for studying marine animals provides
opportunities for deriving biological information from aerial imagery. The large
scale of imagery data acquired from drones is well suited for machine learning
(ML) analysis. Development of ML models for analysing marine animal aerial im-
agery has followed the classical paradigm of training, testing and deploying a new
model for each dataset, requiring significant time, human effort and ML expertise.
2. We introduce Frame- Level Alignment and Tracking (FLAIR), which leverages
the video understanding of Segment Anything Model 2 (SAM 2) and the vision-language
capabilities of Contrastive Language- Image Pre- training (CLIP). FLAIR
takes a drone video as input and outputs segmentation masks of the species of
interest across the video. Notably, FLAIR leverages a zero- shot approach, elimi-
nating the need for labelled data, training a new model or fine- tuning an existing
model to generalize to other species.
3. We trained state- of- the- art object detection and instance segmentation models
on a new dataset of Pacific nurse sharks. We show that FLAIR massively outper-
forms these methods and performs competitively against two human- in- the- loop
approaches for prompting SAM 2, achieving a Dice score of 0.8. FLAIR readily
generalizes to other shark species without additional human effort and can be
combined with custom heuristics to automatically extract relevant information
including length and tailbeat frequency.
4. FLAIR has significant potential to accelerate aerial imagery analyses, requir-
ing markedly less human effort and expertise than traditional machine learning
workflows, while achieving superior accuracy and generalization performance.
By reducing the effort required for aerial imagery analysis, FLAIR allows scien-
tists to spend more time interpreting results and deriving insights about marine
ecosystems. |
|