Kategorien
Academic General

Few-Shot Segment Anything Model

By leveraging intricate data generation pipelines, Segment Anything Model (SAM) excels in interactive segmentation. However, SAM has shown weaknesses in specific scenarios, primarily due to the ambiguity of single point prompts. For example, prompting SAM to segment a human clicking on his torso, the model can produce a mask of the whole human, but also of the individual parts e.g. his upper body or shirt. To mitigate this issue in an interactive segmentation scenario, we allow SAM to use information from a few example (image, mask) pairs without updating its weights i.e. few-shot prompting.

Presented at the Streiflicht 2023 at Ulm University with a custom Annotation Tool.

Kategorien
Academic General

Scene Graph Conditioning in Latent Diffusion

Diffusion models excel in image generation but lack detailed semantic control using text prompts. Additional techniques have been developed to address this limitation. However, conditioning diffusion models solely on text-based descriptions is challenging due to ambiguity and lack of structure. In contrast, scene graphs offer a more precise representation of image content, making them superior for fine-grained control and accurate synthesis in image generation models. The amount of image and scene-graph data is sparse, which makes fine-tuning large diffusion models challenging. We propose multiple approaches to tackle this problem using ControlNet and Gated Self-Attention. 

Kategorien
Academic Bats General Programming

🦇BAT – BioAcoustic Transformer

Bachelor Thesis, graded 1.0 (best grade)

Automatically identifying bat species from their echolocation calls is a difficult but crucial task for monitoring bats and the ecosystem they live in. The main issues are high call variability, similarities between species, interfering calls and lack of annotated data. This thesis proposes a deep learning approach that attempts to tackle these issues by using a Transformer-hybrid architecture that utilizes temporal information and artificially generated interfering calls for multi-label classification. Our method is more efficient than previous methods and has potential for applications in real-time classification scenarios. We were able to achieve a single species accuracy of 88.92% (F1-score of 84.23%) and a multi species macro F1-score of 74.40% on our test set. We compared our method to three other tools on an independent and publicly available dataset, which showed that our method achieved at least 25.82% better accuracy for single species classification and at least 6.9% better macro F1-score for multi species classification. We created a web-demo version with visualization for the multi-label classification and example files on https://bat.hadros.de/. We also created a command-line tool for fast inference on large amounts of data. The entirety of the implementation is opensource.