I am a Lecturer/Researcher at University of Applied Sciences Technikum Wien and Doctoral Student at Graz University of Technology, working on machine learning and modern control approaches in robotics as well as personal projects.
This work was supported by the city of Vienna (MA23 – Economic Affairs, Labour and Statistics) through the project Stadt Wien Kompetenzteam für Drohnentechnik in der Fachhochschulausbildung (DrohnFH, MA23 project 35-02).
1 Graz University of Technology, Faculty of Computer Science and Biomedical Engineering, Institute of Software Engineering and Artificial Intelligence, Inffeldgasse 16b/II, 8010 Graz, Austria
2 University of Applied Sciences Technikum Wien, Faculty of Industrial Engineering, Research Group Digital Manufacturing, Automation and Robotics, 1200 Vienna, Austria
![]() |
![]() |
![]() |
Understanding open-world semantics is critical for robotic planning and control, particularly in unstructured outdoor environments. Current vision-language mapping approaches rely on object-centric segmentation priors, which often fail outdoors due to semantic ambiguities and indistinct semantic class boundaries. We propose OTAS—an Open-vocabulary Token Alignment method for Outdoor Segmentation. OTAS overcomes the limitations of open-vocabulary segmentation models by extracting semantic structure directly from the output tokens of pretrained vision models. By clustering semantically similar structures across single and multiple views and grounding them in language, OTAS reconstructs a geometrically consistent feature field that supports open-vocabulary segmentation queries. Our method operates zero-shot, without scene-specific fine-tuning, and runs at up to ≈17 fps. OTAS provides a minor IoU improvement over fine-tuned and open-vocabulary 2D segmentation methods on the Off-Road Freespace Detection dataset. Our model achieves up to a 151% IoU improvement over open-vocabulary mapping methods in 3D segmentation on TartanAir. Real-world reconstructions demonstrate OTAS’ applicability to robotic applications. The code and ROS node will be made publicly available upon paper acceptance.
If you use this work in your research, please cite our paper:
@misc{Schwaiger2025OTAS,
title = {OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation. \textit{arXiv preprint arXiv:2507.08851}},
author = {Simon Schwaiger and Stefan Thalhammer and Wilfried Wöber and Gerald Steinbauer-Wagner},
year = {2025},
url = {https://arxiv.org/abs/2507.08851}
}